* [PATCH v2 0/9] exfat: convert to iomap
@ 2026-05-07 12:42 Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 1/9] exfat: replace unsafe macros with static inline functions Namjae Jeon
` (8 more replies)
0 siblings, 9 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 12:42 UTC (permalink / raw)
To: sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel,
Namjae Jeon
This patch series converts the exfat filesystem to the iomap framework for
buffered I/O, direct I/O, and llseek (SEEK_HOLE/SEEK_DATA) support.
iozone benchmark results (4KB cluster size, -s1g -r64k, 1GB file, 64KB record size)
1 thread 4 threads
Write Read Write Read
(MB/s) (MB/s) (MB/s) (MB/s)
-----------------------------------------------------------------
exfat + iomap patch 332.7 418.1 78.6 82.4
Current exfat 278.4 415.1 42.1 38.0
-----------------------------------------------------------------
Improvement +19.5% +0.7% +86.7% +117.4%
v2:
- Replace macros with static inline functions.
- Remove noop_direct_IO.
- Consolidate read and write iomap_begin into __exfat_iomap_begin()
- Zero out stale data in straddle block beyond valid_size.
- Introduce IOMAP_F_ZERO_TAIL flag to remove exfat_iomap_put_folio().
- Select FS_IOMAP in Kconfig.
- Remove unnecessary alignment check in exfat_file_read_iter().
- Just use generic_file_llseek directly.
- Move exfat_extend_valid_size to exfat_file_write_iter().
- Use pagecache_isize_extended() to remove iomap_zero_range in
exfat_setattr().
- fix mmap write data corruption with byte-by-byte fallocate.
- Remove exfat_mkwrite_iomap_begin().
Namjae Jeon (9):
exfat: replace unsafe macros with static inline functions
exfat: add balloc parameter to exfat_map_cluster() for iomap support
exfat: add exfat_file_open()
exfat: add support for multi-cluster allocation
iomap: introduce IOMAP_F_ZERO_TAIL flag
exfat: add data_start_bytes and exfat_cluster_to_phys() helper
exfat: add iomap buffered I/O support
exfat: add iomap direct I/O support
exfat: add support for SEEK_HOLE and SEEK_DATA in llseek
fs/exfat/Kconfig | 2 +-
fs/exfat/Makefile | 2 +-
fs/exfat/balloc.c | 2 +-
fs/exfat/dir.c | 50 +++---
fs/exfat/exfat_fs.h | 140 ++++++++++++-----
fs/exfat/fatent.c | 30 ++--
fs/exfat/file.c | 253 +++++++++++++++++++++++-------
fs/exfat/inode.c | 342 ++++++-----------------------------------
fs/exfat/iomap.c | 247 +++++++++++++++++++++++++++++
fs/exfat/iomap.h | 15 ++
fs/exfat/namei.c | 28 ++--
fs/exfat/super.c | 5 +-
fs/iomap/buffered-io.c | 4 +
include/linux/iomap.h | 4 +
14 files changed, 678 insertions(+), 446 deletions(-)
create mode 100644 fs/exfat/iomap.c
create mode 100644 fs/exfat/iomap.h
--
2.25.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 1/9] exfat: replace unsafe macros with static inline functions
2026-05-07 12:42 [PATCH v2 0/9] exfat: convert to iomap Namjae Jeon
@ 2026-05-07 12:42 ` Namjae Jeon
2026-05-07 13:41 ` CharSyam
2026-05-07 12:42 ` [PATCH v2 2/9] exfat: add balloc parameter to exfat_map_cluster() for iomap support Namjae Jeon
` (7 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 12:42 UTC (permalink / raw)
To: sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel,
Namjae Jeon
The current exFAT driver relies on various macros for unit conversions
between clusters, blocks, sectors, and directory entries. These macros
are structurally unsafe as they lack type enforcement and are prone to
potential integer overflows during bit-shift operations, especially
on 64-bit architectures. Replace all arithmetic macros with static inline
functions to provide strict type checking and explicit casting.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/exfat/balloc.c | 2 +-
fs/exfat/dir.c | 48 ++++++++---------
fs/exfat/exfat_fs.h | 122 ++++++++++++++++++++++++++++++++------------
fs/exfat/fatent.c | 4 +-
fs/exfat/file.c | 8 +--
fs/exfat/inode.c | 19 +++----
fs/exfat/namei.c | 26 +++++-----
fs/exfat/super.c | 4 +-
8 files changed, 147 insertions(+), 86 deletions(-)
diff --git a/fs/exfat/balloc.c b/fs/exfat/balloc.c
index 625f2f14d4fe..e66ebf899778 100644
--- a/fs/exfat/balloc.c
+++ b/fs/exfat/balloc.c
@@ -112,7 +112,7 @@ static int exfat_allocate_bitmap(struct super_block *sb,
}
if (exfat_test_bitmap_range(sb, sbi->map_clu,
- EXFAT_B_TO_CLU_ROUND_UP(map_size, sbi)) == false)
+ exfat_bytes_to_cluster_round_up(sbi, map_size)) == false)
goto err_out;
return 0;
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index ac008ccaa97d..ca9d707220df 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -76,7 +76,7 @@ static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_ent
struct super_block *sb = inode->i_sb;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
struct exfat_inode_info *ei = EXFAT_I(inode);
- unsigned int dentry = EXFAT_B_TO_DEN(*cpos) & 0xFFFFFFFF;
+ unsigned int dentry = exfat_bytes_to_dentries(*cpos) & 0xFFFFFFFF;
struct buffer_head *bh;
/* check if the given file ID is opened */
@@ -84,13 +84,13 @@ static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_ent
return -EPERM;
exfat_chain_set(&dir, ei->start_clu,
- EXFAT_B_TO_CLU(i_size_read(inode), sbi), ei->flags);
+ exfat_bytes_to_cluster(sbi, i_size_read(inode)), ei->flags);
dentries_per_clu = sbi->dentries_per_clu;
- max_dentries = (unsigned int)min_t(u64, MAX_EXFAT_DENTRIES,
- (u64)EXFAT_CLU_TO_DEN(sbi->num_clusters, sbi));
+ max_dentries = min(MAX_EXFAT_DENTRIES,
+ exfat_cluster_to_dentries(sbi, sbi->num_clusters));
- clu_offset = EXFAT_DEN_TO_CLU(dentry, sbi);
+ clu_offset = exfat_dentries_to_cluster(sbi, dentry);
exfat_chain_dup(&clu, &dir);
if (clu.flags == ALLOC_FAT_CHAIN) {
@@ -147,10 +147,10 @@ static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_ent
dir_entry->dir = clu;
brelse(bh);
- ei->hint_bmap.off = EXFAT_DEN_TO_CLU(dentry, sbi);
+ ei->hint_bmap.off = exfat_dentries_to_cluster(sbi, dentry);
ei->hint_bmap.clu = clu.dir;
- *cpos = EXFAT_DEN_TO_B(dentry + 1 + num_ext);
+ *cpos = exfat_dentries_to_bytes(dentry + 1 + num_ext);
return 0;
}
@@ -160,7 +160,7 @@ static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_ent
out:
dir_entry->namebuf.lfn[0] = '\0';
- *cpos = EXFAT_DEN_TO_B(dentry);
+ *cpos = exfat_dentries_to_bytes(dentry);
return 0;
}
@@ -465,7 +465,7 @@ static void exfat_free_benign_secondary_clusters(struct inode *inode,
return;
exfat_chain_set(&dir, start_clu,
- EXFAT_B_TO_CLU_ROUND_UP(size, EXFAT_SB(sb)),
+ exfat_bytes_to_cluster_round_up(EXFAT_SB(sb), size),
flags);
exfat_free_cluster(inode, &dir);
}
@@ -556,10 +556,11 @@ static int exfat_find_location(struct super_block *sb, struct exfat_chain *p_dir
unsigned int off, clu = 0;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
- off = EXFAT_DEN_TO_B(entry);
+ off = exfat_dentries_to_bytes(entry);
clu = p_dir->dir;
- ret = exfat_cluster_walk(sb, &clu, EXFAT_B_TO_CLU(off, sbi), p_dir->flags);
+ ret = exfat_cluster_walk(sb, &clu, exfat_bytes_to_cluster(sbi, off),
+ p_dir->flags);
if (ret)
return ret;
@@ -567,7 +568,7 @@ static int exfat_find_location(struct super_block *sb, struct exfat_chain *p_dir
exfat_fs_error(sb,
"unexpected early break in cluster chain (clu : %u, len : %d)",
p_dir->dir,
- EXFAT_B_TO_CLU(off, sbi));
+ exfat_bytes_to_cluster(sbi, off));
return -EIO;
}
@@ -577,13 +578,13 @@ static int exfat_find_location(struct super_block *sb, struct exfat_chain *p_dir
}
/* byte offset in cluster */
- off = EXFAT_CLU_OFFSET(off, sbi);
+ off = exfat_cluster_offset(sbi, off);
/* byte offset in sector */
- *offset = EXFAT_BLK_OFFSET(off, sb);
+ *offset = exfat_block_offset(sb, off);
/* sector offset in cluster */
- *sector = EXFAT_B_TO_BLK(off, sb);
+ *sector = exfat_bytes_to_block(sb, off);
*sector += exfat_cluster_to_sector(sbi, clu);
return 0;
}
@@ -593,7 +594,7 @@ struct exfat_dentry *exfat_get_dentry(struct super_block *sb,
{
struct exfat_sb_info *sbi = EXFAT_SB(sb);
unsigned int sect_per_clus = sbi->sect_per_clus;
- unsigned int dentries_per_page = EXFAT_B_TO_DEN(PAGE_SIZE);
+ unsigned int dentries_per_page = exfat_bytes_to_dentries(PAGE_SIZE);
int off;
sector_t sec;
@@ -672,8 +673,8 @@ struct exfat_dentry *exfat_get_dentry_cached(
struct exfat_entry_set_cache *es, int num)
{
int off = es->start_off + num * DENTRY_SIZE;
- struct buffer_head *bh = es->bh[EXFAT_B_TO_BLK(off, es->sb)];
- char *p = bh->b_data + EXFAT_BLK_OFFSET(off, es->sb);
+ struct buffer_head *bh = es->bh[exfat_bytes_to_block(es->sb, off)];
+ char *p = bh->b_data + exfat_block_offset(es->sb, off);
return (struct exfat_dentry *)p;
}
@@ -741,7 +742,7 @@ static int __exfat_get_dentry_set(struct exfat_entry_set_cache *es,
es->num_entries = num_entries;
- num_bh = EXFAT_B_TO_BLK_ROUND_UP(off + num_entries * DENTRY_SIZE, sb);
+ num_bh = exfat_bytes_to_block_round_up(sb, off + num_entries * DENTRY_SIZE);
if (num_bh > ARRAY_SIZE(es->__bh)) {
es->bh = kmalloc_objs(*es->bh, num_bh, GFP_NOFS);
if (!es->bh) {
@@ -830,7 +831,7 @@ static int exfat_validate_empty_dentry_set(struct exfat_entry_set_cache *es)
err_used_follow_unused:
off = es->start_off + (i << DENTRY_SIZE_BITS);
- bh = es->bh[EXFAT_B_TO_BLK(off, es->sb)];
+ bh = es->bh[exfat_bytes_to_block(es->sb, off)];
exfat_fs_error(es->sb,
"in sector %lld, dentry %d should be unused, but 0x%x",
@@ -839,7 +840,8 @@ static int exfat_validate_empty_dentry_set(struct exfat_entry_set_cache *es)
return -EIO;
count_skip_entries:
- es->num_entries = EXFAT_B_TO_DEN(EXFAT_BLK_TO_B(es->num_bh, es->sb) - es->start_off);
+ es->num_entries =
+ exfat_bytes_to_dentries(exfat_block_to_bytes(es->sb, es->num_bh) - es->start_off);
for (; i < es->num_entries; i++) {
ep = exfat_get_dentry_cached(es, i);
if (IS_EXFAT_DELETED(ep->type))
@@ -892,7 +894,7 @@ static inline void exfat_set_empty_hint(struct exfat_inode_info *ei,
{
if (ei->hint_femp.eidx == EXFAT_HINT_NONE ||
ei->hint_femp.eidx > dentry) {
- int total_entries = EXFAT_B_TO_DEN(i_size_read(&ei->vfs_inode));
+ int total_entries = exfat_bytes_to_dentries(i_size_read(&ei->vfs_inode));
if (candi_empty->count == 0) {
candi_empty->cur = *clu;
@@ -1215,7 +1217,7 @@ static int exfat_get_volume_label_dentry(struct super_block *sb,
es->bh = es->__bh;
es->bh[0] = bh;
es->num_bh = 1;
- es->start_off = EXFAT_DEN_TO_B(i) % sb->s_blocksize;
+ es->start_off = exfat_dentries_to_bytes(i) % sb->s_blocksize;
return 0;
}
diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 89ef5368277f..9c8ab3df7a42 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -84,38 +84,6 @@ enum {
(min_t(blkcnt_t, (sb)->s_bdi->ra_pages, (sb)->s_bdi->io_pages) \
<< (PAGE_SHIFT - (sb)->s_blocksize_bits))
-/*
- * helpers for cluster size to byte conversion.
- */
-#define EXFAT_CLU_TO_B(b, sbi) ((b) << (sbi)->cluster_size_bits)
-#define EXFAT_B_TO_CLU(b, sbi) ((b) >> (sbi)->cluster_size_bits)
-#define EXFAT_B_TO_CLU_ROUND_UP(b, sbi) \
- (((b - 1) >> (sbi)->cluster_size_bits) + 1)
-#define EXFAT_CLU_OFFSET(off, sbi) ((off) & ((sbi)->cluster_size - 1))
-
-/*
- * helpers for block size to byte conversion.
- */
-#define EXFAT_BLK_TO_B(b, sb) ((b) << (sb)->s_blocksize_bits)
-#define EXFAT_B_TO_BLK(b, sb) ((b) >> (sb)->s_blocksize_bits)
-#define EXFAT_B_TO_BLK_ROUND_UP(b, sb) \
- (((b - 1) >> (sb)->s_blocksize_bits) + 1)
-#define EXFAT_BLK_OFFSET(off, sb) ((off) & ((sb)->s_blocksize - 1))
-
-/*
- * helpers for block size to dentry size conversion.
- */
-#define EXFAT_B_TO_DEN(b) ((b) >> DENTRY_SIZE_BITS)
-#define EXFAT_DEN_TO_B(b) ((b) << DENTRY_SIZE_BITS)
-
-/*
- * helpers for cluster size to dentry size conversion.
- */
-#define EXFAT_CLU_TO_DEN(clu, sbi) \
- ((clu) << ((sbi)->cluster_size_bits - DENTRY_SIZE_BITS))
-#define EXFAT_DEN_TO_CLU(dentry, sbi) \
- ((dentry) >> ((sbi)->cluster_size_bits - DENTRY_SIZE_BITS))
-
/*
* helpers for fat entry.
*/
@@ -149,7 +117,7 @@ enum {
* The 608 bytes are in 3 sectors at most (even 512 Byte sector).
*/
#define DIR_CACHE_SIZE \
- (DIV_ROUND_UP(EXFAT_DEN_TO_B(ES_MAX_ENTRY_NUM), SECTOR_SIZE) + 1)
+ (DIV_ROUND_UP(ES_MAX_ENTRY_NUM << DENTRY_SIZE_BITS, SECTOR_SIZE) + 1)
/* Superblock flags */
#define EXFAT_FLAGS_SHUTDOWN 1
@@ -432,6 +400,94 @@ static inline loff_t exfat_ondisk_size(const struct inode *inode)
return ((loff_t)inode->i_blocks) << 9;
}
+/*
+ * helpers for cluster size to byte conversion.
+ */
+static inline loff_t exfat_cluster_to_bytes(struct exfat_sb_info *sbi,
+ u32 nr_clusters)
+{
+ return (loff_t)nr_clusters << sbi->cluster_size_bits;
+}
+
+static inline blkcnt_t exfat_cluster_to_sectors(struct exfat_sb_info *sbi,
+ u32 nr_clusters)
+{
+ return (blkcnt_t)nr_clusters << (sbi->cluster_size_bits - 9);
+}
+
+static inline u32 exfat_bytes_to_cluster(struct exfat_sb_info *sbi, loff_t size)
+{
+ return (u32)(size >> sbi->cluster_size_bits);
+}
+
+static inline u32 exfat_bytes_to_cluster_round_up(struct exfat_sb_info *sbi,
+ loff_t size)
+{
+ if (size <= 0)
+ return 0;
+ return (u32)((size - 1) >> sbi->cluster_size_bits) + 1;
+}
+
+static inline u32 exfat_cluster_offset(struct exfat_sb_info *sbi, loff_t off)
+{
+ return off & (sbi->cluster_size - 1);
+}
+
+/*
+ * helpers for block size to byte conversion.
+ */
+static inline loff_t exfat_block_to_bytes(struct super_block *sb,
+ sector_t block)
+{
+ return (loff_t)block << sb->s_blocksize_bits;
+}
+
+static inline sector_t exfat_bytes_to_block(struct super_block *sb, loff_t size)
+{
+ return (sector_t)(size >> sb->s_blocksize_bits);
+}
+
+static inline sector_t exfat_bytes_to_block_round_up(struct super_block *sb,
+ loff_t size)
+{
+ if (size <= 0)
+ return 0;
+ return (sector_t)(((size - 1) >> sb->s_blocksize_bits) + 1);
+}
+
+static inline u32 exfat_block_offset(struct super_block *sb, loff_t off)
+{
+ return (u32)(off & (sb->s_blocksize - 1));
+}
+
+/*
+ * helpers for block size to dentry size conversion.
+ */
+static inline u32 exfat_bytes_to_dentries(loff_t b)
+{
+ return (u32)(b >> DENTRY_SIZE_BITS);
+}
+
+static inline u32 exfat_dentries_to_bytes(u32 dentry)
+{
+ return dentry << DENTRY_SIZE_BITS;
+}
+
+/*
+ * helpers for cluster size to dentry size conversion.
+ */
+static inline u32 exfat_cluster_to_dentries(struct exfat_sb_info *sbi,
+ u32 nr_clusters)
+{
+ return nr_clusters << (sbi->cluster_size_bits - DENTRY_SIZE_BITS);
+}
+
+static inline u32 exfat_dentries_to_cluster(struct exfat_sb_info *sbi,
+ u32 dentry)
+{
+ return dentry >> (sbi->cluster_size_bits - DENTRY_SIZE_BITS);
+}
+
/* super.c */
int exfat_set_volume_dirty(struct super_block *sb);
int exfat_clear_volume_dirty(struct super_block *sb);
diff --git a/fs/exfat/fatent.c b/fs/exfat/fatent.c
index dce0955e689a..45b0b754a2e4 100644
--- a/fs/exfat/fatent.c
+++ b/fs/exfat/fatent.c
@@ -412,8 +412,8 @@ int exfat_zeroed_cluster(struct inode *dir, unsigned int clu)
if (IS_DIRSYNC(dir))
return sync_blockdev_range(sb->s_bdev,
- EXFAT_BLK_TO_B(blknr, sb),
- EXFAT_BLK_TO_B(last_blknr, sb) - 1);
+ exfat_block_to_bytes(sb, blknr),
+ exfat_block_to_bytes(sb, last_blknr) - 1);
return 0;
}
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 354bdcfe4abc..29a36a80e29b 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -33,9 +33,9 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
if (ret)
return ret;
- num_clusters = EXFAT_B_TO_CLU(exfat_ondisk_size(inode), sbi);
+ num_clusters = exfat_bytes_to_cluster(sbi, exfat_ondisk_size(inode));
/* integer overflow is already checked in inode_newsize_ok(). */
- new_num_clusters = EXFAT_B_TO_CLU_ROUND_UP(size, sbi);
+ new_num_clusters = exfat_bytes_to_cluster_round_up(sbi, size);
if (new_num_clusters == num_clusters)
goto out;
@@ -200,8 +200,8 @@ int __exfat_truncate(struct inode *inode)
exfat_set_volume_dirty(sb);
- num_clusters_new = EXFAT_B_TO_CLU_ROUND_UP(i_size_read(inode), sbi);
- num_clusters_phys = EXFAT_B_TO_CLU(exfat_ondisk_size(inode), sbi);
+ num_clusters_new = exfat_bytes_to_cluster_round_up(sbi, i_size_read(inode));
+ num_clusters_phys = exfat_bytes_to_cluster(sbi, exfat_ondisk_size(inode));
exfat_chain_set(&clu, ei->start_clu, num_clusters_phys, ei->flags);
diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
index 1ea4c740fef9..d25e0a69865b 100644
--- a/fs/exfat/inode.c
+++ b/fs/exfat/inode.c
@@ -135,7 +135,7 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
unsigned int local_clu_offset = clu_offset;
unsigned int num_to_be_allocated = 0, num_clusters;
- num_clusters = EXFAT_B_TO_CLU(exfat_ondisk_size(inode), sbi);
+ num_clusters = exfat_bytes_to_cluster(sbi, exfat_ondisk_size(inode));
if (clu_offset >= num_clusters)
num_to_be_allocated = clu_offset - num_clusters + 1;
@@ -216,7 +216,8 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
*clu = new_clu.dir;
- inode->i_blocks += EXFAT_CLU_TO_B(num_to_be_allocated, sbi) >> 9;
+ inode->i_blocks +=
+ exfat_cluster_to_sectors(sbi, num_to_be_allocated) >> 9;
/*
* Move *clu pointer along FAT chains (hole care) because the
@@ -254,12 +255,12 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
mutex_lock(&sbi->s_lock);
i_size = i_size_read(inode);
- last_block = EXFAT_B_TO_BLK_ROUND_UP(i_size, sb);
+ last_block = exfat_bytes_to_block_round_up(sb, i_size);
if (iblock >= last_block && !create)
goto done;
/* Is this block already allocated? */
- count = EXFAT_B_TO_CLU_ROUND_UP(bh_result->b_size, sbi);
+ count = exfat_bytes_to_cluster_round_up(sbi, bh_result->b_size);
err = exfat_map_cluster(inode, iblock >> sbi->sect_per_clus_bits,
&cluster, &count, create);
if (err) {
@@ -296,9 +297,9 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
* care the last nested block if valid_size is not equal to i_size.
*/
if (i_size == ei->valid_size || create || !bh_result->b_folio)
- valid_blks = EXFAT_B_TO_BLK_ROUND_UP(ei->valid_size, sb);
+ valid_blks = exfat_bytes_to_block_round_up(sb, ei->valid_size);
else
- valid_blks = EXFAT_B_TO_BLK(ei->valid_size, sb);
+ valid_blks = exfat_bytes_to_block(sb, ei->valid_size);
/* The range has been fully written, map it */
if (iblock + max_blocks < valid_blks)
@@ -313,7 +314,7 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
/* The area has not been written, map and mark as new for create case */
if (create) {
set_buffer_new(bh_result);
- ei->valid_size = EXFAT_BLK_TO_B(iblock + max_blocks, sb);
+ ei->valid_size = exfat_block_to_bytes(sb, iblock + max_blocks);
mark_inode_dirty(inode);
goto done;
}
@@ -343,7 +344,7 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
goto done;
}
- pos = EXFAT_BLK_TO_B(iblock, sb);
+ pos = exfat_block_to_bytes(sb, iblock);
size = ei->valid_size - pos;
addr = folio_address(bh_result->b_folio) +
offset_in_folio(bh_result->b_folio, pos);
@@ -374,7 +375,7 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
*/
clear_buffer_mapped(bh_result);
done:
- bh_result->b_size = EXFAT_BLK_TO_B(max_blocks, sb);
+ bh_result->b_size = exfat_block_to_bytes(sb, max_blocks);
if (err < 0)
clear_buffer_mapped(bh_result);
unlock_ret:
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 2c5636634b4a..752fbec9316b 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -208,7 +208,7 @@ static int exfat_search_empty_slot(struct super_block *sb,
int dentries_per_clu;
struct exfat_chain clu;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
- int total_entries = EXFAT_CLU_TO_DEN(p_dir->size, sbi);
+ unsigned int total_entries = exfat_cluster_to_dentries(sbi, p_dir->size);
dentries_per_clu = sbi->dentries_per_clu;
@@ -266,7 +266,7 @@ static int exfat_search_empty_slot(struct super_block *sb,
static int exfat_check_max_dentries(struct inode *inode)
{
- if (EXFAT_B_TO_DEN(i_size_read(inode)) >= MAX_EXFAT_DENTRIES) {
+ if (exfat_bytes_to_dentries(i_size_read(inode)) >= MAX_EXFAT_DENTRIES) {
/*
* exFAT spec allows a dir to grow up to 8388608(256MB)
* dentries
@@ -314,7 +314,8 @@ int exfat_find_empty_entry(struct inode *inode,
}
exfat_chain_set(p_dir, ei->start_clu,
- EXFAT_B_TO_CLU(i_size_read(inode), sbi), ei->flags);
+ exfat_bytes_to_cluster(sbi, i_size_read(inode)),
+ ei->flags);
while ((dentry = exfat_search_empty_slot(sb, &hint_femp, p_dir,
num_entries, es)) < 0) {
@@ -375,7 +376,7 @@ int exfat_find_empty_entry(struct inode *inode,
hint_femp.cur.size++;
p_dir->size++;
- size = EXFAT_CLU_TO_B(p_dir->size, sbi);
+ size = exfat_cluster_to_bytes(sbi, p_dir->size);
/* directory inode should be updated in here */
i_size_write(inode, size);
@@ -604,7 +605,7 @@ static int exfat_find(struct inode *dir, const struct qstr *qname,
return ret;
exfat_chain_set(&cdir, ei->start_clu,
- EXFAT_B_TO_CLU(i_size_read(dir), sbi), ei->flags);
+ exfat_bytes_to_cluster(sbi, i_size_read(dir)), ei->flags);
/* check the validation of hint_stat and initialize it if required */
if (ei->version != (inode_peek_iversion_raw(dir) & 0xffffffff)) {
@@ -681,7 +682,7 @@ static int exfat_find(struct inode *dir, const struct qstr *qname,
return -EIO;
}
- if (unlikely(EXFAT_B_TO_CLU_ROUND_UP(info->size, sbi) > sbi->used_clusters)) {
+ if (unlikely(exfat_bytes_to_cluster_round_up(sbi, info->size) > sbi->used_clusters)) {
exfat_fs_error(sb, "data size is invalid(%lld)", info->size);
return -EIO;
}
@@ -695,7 +696,8 @@ static int exfat_find(struct inode *dir, const struct qstr *qname,
if (info->type == TYPE_DIR) {
exfat_chain_set(&cdir, info->start_clu,
- EXFAT_B_TO_CLU(info->size, sbi), info->flags);
+ exfat_bytes_to_cluster(sbi, info->size),
+ info->flags);
count = exfat_count_dir_entries(sb, &cdir);
if (count < 0)
return -EIO;
@@ -951,7 +953,7 @@ static int exfat_rmdir(struct inode *dir, struct dentry *dentry)
}
exfat_chain_set(&clu_to_free, ei->start_clu,
- EXFAT_B_TO_CLU_ROUND_UP(i_size_read(inode), sbi), ei->flags);
+ exfat_bytes_to_cluster_round_up(sbi, i_size_read(inode)), ei->flags);
err = exfat_check_dir_empty(sb, &clu_to_free);
if (err) {
@@ -1158,8 +1160,8 @@ static int __exfat_rename(struct inode *old_parent_inode,
new_clu.dir = new_ei->start_clu;
new_clu.size =
- EXFAT_B_TO_CLU_ROUND_UP(i_size_read(new_inode),
- sbi);
+ exfat_bytes_to_cluster_round_up(sbi,
+ i_size_read(new_inode));
new_clu.flags = new_ei->flags;
ret = exfat_check_dir_empty(sb, &new_clu);
@@ -1203,8 +1205,8 @@ static int __exfat_rename(struct inode *old_parent_inode,
struct exfat_chain new_clu_to_free;
exfat_chain_set(&new_clu_to_free, new_ei->start_clu,
- EXFAT_B_TO_CLU_ROUND_UP(i_size_read(new_inode),
- sbi), new_ei->flags);
+ exfat_bytes_to_cluster_round_up(sbi, i_size_read(new_inode)),
+ new_ei->flags);
if (exfat_free_cluster(new_inode, &new_clu_to_free)) {
/* just set I/O error only */
diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index 95d87e2d7717..cb2f8eefff99 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -369,7 +369,7 @@ static int exfat_read_root(struct inode *inode, struct exfat_chain *root_clu)
ei->hint_stat.clu = sbi->root_dir;
ei->hint_femp.eidx = EXFAT_HINT_NONE;
- i_size_write(inode, EXFAT_CLU_TO_B(root_clu->size, sbi));
+ i_size_write(inode, exfat_cluster_to_bytes(sbi, root_clu->size));
num_subdirs = exfat_count_dir_entries(sb, root_clu);
if (num_subdirs < 0)
@@ -538,7 +538,7 @@ static int exfat_read_boot_sector(struct super_block *sb)
* machines.
*/
sb->s_maxbytes = min(MAX_LFS_FILESIZE,
- EXFAT_CLU_TO_B((loff_t)EXFAT_MAX_NUM_CLUSTER, sbi));
+ exfat_cluster_to_bytes(sbi, (loff_t)EXFAT_MAX_NUM_CLUSTER));
/* check logical sector size */
if (exfat_calibrate_blocksize(sb, 1 << p_boot->sect_size_bits))
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 2/9] exfat: add balloc parameter to exfat_map_cluster() for iomap support
2026-05-07 12:42 [PATCH v2 0/9] exfat: convert to iomap Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 1/9] exfat: replace unsafe macros with static inline functions Namjae Jeon
@ 2026-05-07 12:42 ` Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 3/9] exfat: add exfat_file_open() Namjae Jeon
` (6 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 12:42 UTC (permalink / raw)
To: sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel,
Namjae Jeon
In preparation for supporting the iomap infrastructure, we need to know
whether a new cluster was allocated or not in exfat_map_cluster().
Add an optional 'bool *balloc' output parameter. When a new cluster is
allocated, *balloc is set to true. Pass NULL from exfat_get_block() to
preserve the existing behavior.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/exfat/inode.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
index d25e0a69865b..1246713567a5 100644
--- a/fs/exfat/inode.c
+++ b/fs/exfat/inode.c
@@ -124,7 +124,8 @@ void exfat_sync_inode(struct inode *inode)
* *clu = (~0), if it's unable to allocate a new cluster
*/
static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
- unsigned int *clu, unsigned int *count, int create)
+ unsigned int *clu, unsigned int *count, int create,
+ bool *balloc)
{
int ret;
unsigned int last_clu;
@@ -229,6 +230,8 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
if (exfat_cluster_walk(sb, clu, num_to_be_allocated - 1, ei->flags))
return -EIO;
*count = 1;
+ if (balloc)
+ *balloc = true;
}
/* hint information */
@@ -262,7 +265,7 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
/* Is this block already allocated? */
count = exfat_bytes_to_cluster_round_up(sbi, bh_result->b_size);
err = exfat_map_cluster(inode, iblock >> sbi->sect_per_clus_bits,
- &cluster, &count, create);
+ &cluster, &count, create, NULL);
if (err) {
if (err != -ENOSPC)
exfat_fs_error_ratelimit(sb,
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 3/9] exfat: add exfat_file_open()
2026-05-07 12:42 [PATCH v2 0/9] exfat: convert to iomap Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 1/9] exfat: replace unsafe macros with static inline functions Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 2/9] exfat: add balloc parameter to exfat_map_cluster() for iomap support Namjae Jeon
@ 2026-05-07 12:42 ` Namjae Jeon
2026-05-07 13:52 ` CharSyam
2026-05-07 12:42 ` [PATCH v2 4/9] exfat: add support for multi-cluster allocation Namjae Jeon
` (5 subsequent siblings)
8 siblings, 1 reply; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 12:42 UTC (permalink / raw)
To: sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel,
Namjae Jeon
Add exfat_file_open() to handle file open operation for exFAT.
This change is a preparation step before introducing iomap-based direct
IO support.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/exfat/file.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 29a36a80e29b..b7a4964631a0 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -800,7 +800,22 @@ static ssize_t exfat_splice_read(struct file *in, loff_t *ppos,
return filemap_splice_read(in, ppos, pipe, len, flags);
}
+static int exfat_file_open(struct inode *inode, struct file *filp)
+{
+ int err;
+
+ if (unlikely(exfat_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
+ err = generic_file_open(inode, filp);
+ if (err)
+ return err;
+
+ return 0;
+}
+
const struct file_operations exfat_file_operations = {
+ .open = exfat_file_open,
.llseek = generic_file_llseek,
.read_iter = exfat_file_read_iter,
.write_iter = exfat_file_write_iter,
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 4/9] exfat: add support for multi-cluster allocation
2026-05-07 12:42 [PATCH v2 0/9] exfat: convert to iomap Namjae Jeon
` (2 preceding siblings ...)
2026-05-07 12:42 ` [PATCH v2 3/9] exfat: add exfat_file_open() Namjae Jeon
@ 2026-05-07 12:42 ` Namjae Jeon
2026-05-07 14:09 ` CharSyam
2026-05-10 13:32 ` Chi Zhiling
2026-05-07 12:42 ` [PATCH v2 5/9] iomap: introduce IOMAP_F_ZERO_TAIL flag Namjae Jeon
` (4 subsequent siblings)
8 siblings, 2 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 12:42 UTC (permalink / raw)
To: sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel,
Namjae Jeon
Currently exfat_map_cluster() allocates and returns only one cluster
at a time even when more clusters are needed. This causes multiple
FAT walks and repeated allocation calls during large sequential writes
or when using iomap for writes. This change exfat_map_cluster() and
exfat_alloc_cluster() to be able to allocate multiple contiguous
clusters.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/exfat/dir.c | 2 +-
fs/exfat/exfat_fs.h | 2 +-
fs/exfat/fatent.c | 26 ++++++++++++++++----------
fs/exfat/file.c | 2 +-
fs/exfat/inode.c | 23 ++++++-----------------
fs/exfat/namei.c | 2 +-
6 files changed, 26 insertions(+), 31 deletions(-)
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index ca9d707220df..37f324399b6b 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -295,7 +295,7 @@ int exfat_alloc_new_dir(struct inode *inode, struct exfat_chain *clu)
exfat_chain_set(clu, EXFAT_EOF_CLUSTER, 0, ALLOC_NO_FAT_CHAIN);
- ret = exfat_alloc_cluster(inode, 1, clu, IS_DIRSYNC(inode));
+ ret = exfat_alloc_cluster(inode, 1, clu, IS_DIRSYNC(inode), false);
if (ret)
return ret;
diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 9c8ab3df7a42..6e7fd6822b01 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -497,7 +497,7 @@ int exfat_clear_volume_dirty(struct super_block *sb);
exfat_cluster_walk(sb, (pclu), 1, ALLOC_FAT_CHAIN)
int exfat_alloc_cluster(struct inode *inode, unsigned int num_alloc,
- struct exfat_chain *p_chain, bool sync_bmap);
+ struct exfat_chain *p_chain, bool sync_bmap, bool contig);
int exfat_free_cluster(struct inode *inode, struct exfat_chain *p_chain);
int exfat_ent_get(struct super_block *sb, unsigned int loc,
unsigned int *content, struct buffer_head **last);
diff --git a/fs/exfat/fatent.c b/fs/exfat/fatent.c
index 45b0b754a2e4..a917c954bd23 100644
--- a/fs/exfat/fatent.c
+++ b/fs/exfat/fatent.c
@@ -419,7 +419,7 @@ int exfat_zeroed_cluster(struct inode *dir, unsigned int clu)
}
int exfat_alloc_cluster(struct inode *inode, unsigned int num_alloc,
- struct exfat_chain *p_chain, bool sync_bmap)
+ struct exfat_chain *p_chain, bool sync_bmap, bool contig)
{
int ret = -ENOSPC;
unsigned int total_cnt;
@@ -470,14 +470,20 @@ int exfat_alloc_cluster(struct inode *inode, unsigned int num_alloc,
while ((new_clu = exfat_find_free_bitmap(sb, hint_clu)) !=
EXFAT_EOF_CLUSTER) {
- if (new_clu != hint_clu &&
- p_chain->flags == ALLOC_NO_FAT_CHAIN) {
- if (exfat_chain_cont_cluster(sb, p_chain->dir,
- p_chain->size)) {
- ret = -EIO;
- goto free_cluster;
+ if (new_clu != hint_clu) {
+ if (p_chain->flags == ALLOC_NO_FAT_CHAIN) {
+ if (exfat_chain_cont_cluster(sb, p_chain->dir,
+ p_chain->size)) {
+ ret = -EIO;
+ goto free_cluster;
+ }
+ p_chain->flags = ALLOC_FAT_CHAIN;
+ }
+
+ if (contig && p_chain->size > 0) {
+ hint_clu--;
+ goto done;
}
- p_chain->flags = ALLOC_FAT_CHAIN;
}
/* update allocation bitmap */
@@ -507,9 +513,9 @@ int exfat_alloc_cluster(struct inode *inode, unsigned int num_alloc,
last_clu = new_clu;
if (p_chain->size == num_alloc) {
+done:
sbi->clu_srch_ptr = hint_clu;
- sbi->used_clusters += num_alloc;
-
+ sbi->used_clusters += p_chain->size;
mutex_unlock(&sbi->bitmap_lock);
return 0;
}
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index b7a4964631a0..15b9d6a1766a 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -56,7 +56,7 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
clu.flags = ei->flags;
ret = exfat_alloc_cluster(inode, new_num_clusters - num_clusters,
- &clu, inode_needs_sync(inode));
+ &clu, inode_needs_sync(inode), false);
if (ret)
return ret;
diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
index 1246713567a5..7b09d94ac464 100644
--- a/fs/exfat/inode.c
+++ b/fs/exfat/inode.c
@@ -137,9 +137,9 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
unsigned int num_to_be_allocated = 0, num_clusters;
num_clusters = exfat_bytes_to_cluster(sbi, exfat_ondisk_size(inode));
-
- if (clu_offset >= num_clusters)
- num_to_be_allocated = clu_offset - num_clusters + 1;
+ if (clu_offset > num_clusters ||
+ *count > num_clusters - clu_offset)
+ num_to_be_allocated = clu_offset + *count - num_clusters;
if (!create && (num_to_be_allocated > 0)) {
*clu = EXFAT_EOF_CLUSTER;
@@ -182,7 +182,7 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
}
ret = exfat_alloc_cluster(inode, num_to_be_allocated, &new_clu,
- inode_needs_sync(inode));
+ inode_needs_sync(inode), true);
if (ret)
return ret;
@@ -216,20 +216,9 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
}
*clu = new_clu.dir;
+ *count = new_clu.size;
- inode->i_blocks +=
- exfat_cluster_to_sectors(sbi, num_to_be_allocated) >> 9;
-
- /*
- * Move *clu pointer along FAT chains (hole care) because the
- * caller of this function expect *clu to be the last cluster.
- * This only works when num_to_be_allocated >= 2,
- * *clu = (the first cluster of the allocated chain) =>
- * (the last cluster of ...)
- */
- if (exfat_cluster_walk(sb, clu, num_to_be_allocated - 1, ei->flags))
- return -EIO;
- *count = 1;
+ inode->i_blocks += exfat_cluster_to_sectors(sbi, new_clu.size);
if (balloc)
*balloc = true;
}
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
index 752fbec9316b..87e55c5c1bf4 100644
--- a/fs/exfat/namei.c
+++ b/fs/exfat/namei.c
@@ -341,7 +341,7 @@ int exfat_find_empty_entry(struct inode *inode,
}
/* allocate a cluster */
- ret = exfat_alloc_cluster(inode, 1, &clu, IS_DIRSYNC(inode));
+ ret = exfat_alloc_cluster(inode, 1, &clu, IS_DIRSYNC(inode), false);
if (ret)
return ret;
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 5/9] iomap: introduce IOMAP_F_ZERO_TAIL flag
2026-05-07 12:42 [PATCH v2 0/9] exfat: convert to iomap Namjae Jeon
` (3 preceding siblings ...)
2026-05-07 12:42 ` [PATCH v2 4/9] exfat: add support for multi-cluster allocation Namjae Jeon
@ 2026-05-07 12:42 ` Namjae Jeon
2026-05-09 9:59 ` Chi Zhiling
2026-05-11 12:45 ` Christoph Hellwig
2026-05-07 12:42 ` [PATCH v2 6/9] exfat: add data_start_bytes and exfat_cluster_to_phys() helper Namjae Jeon
` (3 subsequent siblings)
8 siblings, 2 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 12:42 UTC (permalink / raw)
To: sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel,
Namjae Jeon
In filesystems that maintain a separate Valid Data Length, such as exFAT
and NTFS, a partial write may start at or beyond the current valid_size and
extend it. In this case, the region after the previous valid_size but
within the same filesystem block is considered unwritten.
This patch introduces IOMAP_F_ZERO_TAIL. When this flag is set in iomap,
__iomap_write_begin() will zero only the tail portion while preserving any
valid data before it in the same block.
Without this tail zeroing, stale data in the unwritten portion of the block
can remain in the page cache. Subsequent reads can then return incorrect
contents from that region.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/iomap/buffered-io.c | 4 ++++
include/linux/iomap.h | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d7b648421a70..9ae136a9fcd4 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -837,6 +837,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
folio_zero_segments(folio, poff, from, to, poff + plen);
} else {
int status;
+ const struct iomap *iomap = iomap_iter_srcmap(iter);
if (iter->flags & IOMAP_NOWAIT)
return -EAGAIN;
@@ -853,6 +854,9 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
len, status, GFP_NOFS);
if (status)
return status;
+
+ if (iomap->flags & IOMAP_F_ZERO_TAIL)
+ folio_zero_segment(folio, to, poff + plen);
}
iomap_set_range_uptodate(folio, poff, plen);
} while ((block_start += plen) < block_end);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 2c5685adf3a9..e3e7ad72b29e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -67,6 +67,9 @@ struct vm_fault;
* bio, i.e. set REQ_ATOMIC.
*
* IOMAP_F_INTEGRITY indicates that the filesystems handles integrity metadata.
+ *
+ * IOMAP_F_ZERO_TAIL indicates only the unwritten tail of the block should be
+ * zeroed.
*/
#define IOMAP_F_NEW (1U << 0)
#define IOMAP_F_DIRTY (1U << 1)
@@ -86,6 +89,7 @@ struct vm_fault;
#else
#define IOMAP_F_INTEGRITY 0
#endif /* CONFIG_BLK_DEV_INTEGRITY */
+#define IOMAP_F_ZERO_TAIL (1U << 10)
/*
* Flag reserved for file system specific usage
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 6/9] exfat: add data_start_bytes and exfat_cluster_to_phys() helper
2026-05-07 12:42 [PATCH v2 0/9] exfat: convert to iomap Namjae Jeon
` (4 preceding siblings ...)
2026-05-07 12:42 ` [PATCH v2 5/9] iomap: introduce IOMAP_F_ZERO_TAIL flag Namjae Jeon
@ 2026-05-07 12:42 ` Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 7/9] exfat: add iomap buffered I/O support Namjae Jeon
` (2 subsequent siblings)
8 siblings, 0 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 12:42 UTC (permalink / raw)
To: sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel,
Namjae Jeon
This caches the data area start offset in bytes (data_start_bytes)
and introduces a helper function exfat_cluster_to_phys() to compute
the physical byte position of a given cluster.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/exfat/exfat_fs.h | 8 ++++++++
fs/exfat/super.c | 1 +
2 files changed, 9 insertions(+)
diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 6e7fd6822b01..415f987afa9a 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -227,6 +227,7 @@ struct exfat_sb_info {
unsigned long long FAT1_start_sector; /* FAT1 start sector */
unsigned long long FAT2_start_sector; /* FAT2 start sector */
unsigned long long data_start_sector; /* data area start sector */
+ unsigned long long data_start_bytes;
unsigned int num_FAT_sectors; /* num of FAT sectors */
unsigned int root_dir; /* root dir cluster */
unsigned int dentries_per_clu; /* num of dentries per cluster */
@@ -400,6 +401,13 @@ static inline loff_t exfat_ondisk_size(const struct inode *inode)
return ((loff_t)inode->i_blocks) << 9;
}
+static inline loff_t exfat_cluster_to_phys(struct exfat_sb_info *sbi,
+ unsigned int clus)
+{
+ return ((loff_t)(clus - EXFAT_RESERVED_CLUSTERS) << sbi->cluster_size_bits) +
+ sbi->data_start_bytes;
+}
+
/*
* helpers for cluster size to byte conversion.
*/
diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index cb2f8eefff99..388db271c6bf 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -499,6 +499,7 @@ static int exfat_read_boot_sector(struct super_block *sb)
if (p_boot->num_fats == 2)
sbi->FAT2_start_sector += sbi->num_FAT_sectors;
sbi->data_start_sector = le32_to_cpu(p_boot->clu_offset);
+ sbi->data_start_bytes = sbi->data_start_sector << p_boot->sect_size_bits;
sbi->num_sectors = le64_to_cpu(p_boot->vol_length);
/* because the cluster index starts with 2 */
sbi->num_clusters = le32_to_cpu(p_boot->clu_count) +
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 7/9] exfat: add iomap buffered I/O support
2026-05-07 12:42 [PATCH v2 0/9] exfat: convert to iomap Namjae Jeon
` (5 preceding siblings ...)
2026-05-07 12:42 ` [PATCH v2 6/9] exfat: add data_start_bytes and exfat_cluster_to_phys() helper Namjae Jeon
@ 2026-05-07 12:42 ` Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 8/9] exfat: add iomap direct " Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 9/9] exfat: add support for SEEK_HOLE and SEEK_DATA in llseek Namjae Jeon
8 siblings, 0 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 12:42 UTC (permalink / raw)
To: sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel,
Namjae Jeon
Add full buffered I/O support using the iomap framework to the exfat
filesystem. This will replaces the old exfat_get_block(),
exfat_write_begin(), exfat_write_end(), and exfat_block_truncate_page()
with their iomap equivalents. Buffered writes now use
iomap_file_buffered_write(), read uses iomap_bio_read_folio() and
iomap_bio_readahead(), and writeback is handled through iomap_writepages().
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/exfat/Kconfig | 1 +
fs/exfat/Makefile | 2 +-
fs/exfat/exfat_fs.h | 7 +-
fs/exfat/file.c | 141 ++++++++++++++++++++-----------
fs/exfat/inode.c | 117 ++++++++------------------
fs/exfat/iomap.c | 197 ++++++++++++++++++++++++++++++++++++++++++++
fs/exfat/iomap.h | 14 ++++
7 files changed, 350 insertions(+), 129 deletions(-)
create mode 100644 fs/exfat/iomap.c
create mode 100644 fs/exfat/iomap.h
diff --git a/fs/exfat/Kconfig b/fs/exfat/Kconfig
index cbeca8e44d9b..e0b200902253 100644
--- a/fs/exfat/Kconfig
+++ b/fs/exfat/Kconfig
@@ -5,6 +5,7 @@ config EXFAT_FS
select BUFFER_HEAD
select NLS
select LEGACY_DIRECT_IO
+ select FS_IOMAP
help
This allows you to mount devices formatted with the exFAT file system.
exFAT is typically used on SD-Cards or USB sticks.
diff --git a/fs/exfat/Makefile b/fs/exfat/Makefile
index ed51926a4971..e06bf85870ae 100644
--- a/fs/exfat/Makefile
+++ b/fs/exfat/Makefile
@@ -5,4 +5,4 @@
obj-$(CONFIG_EXFAT_FS) += exfat.o
exfat-y := inode.o namei.o dir.o super.o fatent.o cache.o nls.o misc.o \
- file.o balloc.o
+ file.o balloc.o iomap.o
diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 415f987afa9a..448857d4b70f 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -12,6 +12,7 @@
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
#include <uapi/linux/exfat.h>
+#include <linux/buffer_head.h>
#define EXFAT_ROOT_INO 1
@@ -293,6 +294,8 @@ struct exfat_inode_info {
/* on-disk position of directory entry or 0 */
loff_t i_pos;
loff_t valid_size;
+ /* page-aligned size that has been zeroed out for mmap */
+ loff_t zeroed_size;
/* hash by i_location */
struct hlist_node i_hash_fat;
/* protect bmap against truncate */
@@ -648,7 +651,9 @@ struct inode *exfat_iget(struct super_block *sb, loff_t i_pos);
int __exfat_write_inode(struct inode *inode, int sync);
int exfat_write_inode(struct inode *inode, struct writeback_control *wbc);
void exfat_evict_inode(struct inode *inode);
-int exfat_block_truncate_page(struct inode *inode, loff_t from);
+int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
+ unsigned int *clu, unsigned int *count, int create,
+ bool *balloc);
/* exfat/nls.c */
unsigned short exfat_toupper(struct super_block *sb, unsigned short a);
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 15b9d6a1766a..6033e8ae4628 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -14,9 +14,11 @@
#include <linux/writeback.h>
#include <linux/filelock.h>
#include <linux/falloc.h>
+#include <linux/iomap.h>
#include "exfat_raw.h"
#include "exfat_fs.h"
+#include "iomap.h"
static int exfat_cont_expand(struct inode *inode, loff_t size)
{
@@ -26,8 +28,9 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
struct super_block *sb = inode->i_sb;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
struct exfat_chain clu;
+ loff_t oldsize = i_size_read(inode);
- truncate_pagecache(inode, i_size_read(inode));
+ truncate_pagecache(inode, oldsize);
ret = inode_newsize_ok(inode, size);
if (ret)
@@ -78,6 +81,13 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
/* Expanded range not zeroed, do not update valid_size */
i_size_write(inode, size);
+ /*
+ * When extending file size, call truncate_pagecache() first,
+ * then update i_size, and call pagecache_isize_extended()
+ * to ensures the straddling folio is properly marked RO so
+ * page_mkwrite() is called and post-EOF area is zeroed.
+ */
+ pagecache_isize_extended(inode, oldsize, inode->i_size);
inode->i_blocks = round_up(size, sbi->cluster_size) >> 9;
mark_inode_dirty(inode);
@@ -236,7 +246,7 @@ int __exfat_truncate(struct inode *inode)
}
if (i_size_read(inode) < ei->valid_size)
- ei->valid_size = i_size_read(inode);
+ ei->valid_size = ei->zeroed_size = i_size_read(inode);
if (ei->type == TYPE_FILE)
ei->attr |= EXFAT_ATTR_ARCHIVE;
@@ -383,10 +393,6 @@ int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
exfat_truncate_inode_atime(inode);
if (attr->ia_valid & ATTR_SIZE) {
- error = exfat_block_truncate_page(inode, attr->ia_size);
- if (error)
- goto out;
-
down_write(&EXFAT_I(inode)->truncate_lock);
truncate_setsize(inode, attr->ia_size);
@@ -631,42 +637,31 @@ int exfat_file_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
static int exfat_extend_valid_size(struct inode *inode, loff_t new_valid_size)
{
- int err;
- loff_t pos;
struct exfat_inode_info *ei = EXFAT_I(inode);
- struct address_space *mapping = inode->i_mapping;
- const struct address_space_operations *ops = mapping->a_ops;
-
- pos = ei->valid_size;
- while (pos < new_valid_size) {
- u32 len;
- struct folio *folio;
- unsigned long off;
-
- len = PAGE_SIZE - (pos & (PAGE_SIZE - 1));
- if (pos + len > new_valid_size)
- len = new_valid_size - pos;
-
- err = ops->write_begin(NULL, mapping, pos, len, &folio, NULL);
- if (err)
- goto out;
+ struct exfat_sb_info *sbi = EXFAT_SB(inode->i_sb);
+ loff_t old_valid_size;
+ int ret = 0;
- off = offset_in_folio(folio, pos);
- folio_zero_new_buffers(folio, off, off + len);
+ mutex_lock(&sbi->s_lock);
+ old_valid_size = ei->valid_size;
+ mutex_unlock(&sbi->s_lock);
- err = ops->write_end(NULL, mapping, pos, len, len, folio, NULL);
- if (err < 0)
- goto out;
- pos += len;
+ if (old_valid_size < new_valid_size) {
+ if (i_size_read(inode) < new_valid_size) {
+ i_size_write(inode, new_valid_size);
+ mark_inode_dirty(inode);
+ }
- balance_dirty_pages_ratelimited(mapping);
- cond_resched();
+ ret = iomap_zero_range(inode, old_valid_size,
+ new_valid_size - old_valid_size, NULL,
+ &exfat_write_iomap_ops, NULL, NULL);
+ if (ret) {
+ truncate_setsize(inode, old_valid_size);
+ exfat_truncate(inode);
+ }
}
- return 0;
-
-out:
- return err;
+ return ret;
}
static ssize_t exfat_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
@@ -677,6 +672,7 @@ static ssize_t exfat_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
struct exfat_inode_info *ei = EXFAT_I(inode);
loff_t pos = iocb->ki_pos;
loff_t valid_size;
+ int err;
if (unlikely(exfat_forced_shutdown(inode->i_sb)))
return -EIO;
@@ -702,6 +698,12 @@ static ssize_t exfat_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
}
}
+ err = file_modified(iocb->ki_filp);
+ if (err) {
+ ret = err;
+ goto unlock;
+ }
+
if (pos > valid_size) {
ret = exfat_extend_valid_size(inode, pos);
if (ret < 0 && ret != -ENOSPC) {
@@ -713,7 +715,11 @@ static ssize_t exfat_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
goto unlock;
}
- ret = __generic_file_write_iter(iocb, iter);
+ if (iocb->ki_flags & IOCB_DIRECT)
+ ret = __generic_file_write_iter(iocb, iter);
+ else
+ ret = iomap_file_buffered_write(iocb, iter,
+ &exfat_write_iomap_ops, NULL, NULL);
if (ret < 0)
goto unlock;
@@ -749,28 +755,56 @@ static ssize_t exfat_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
static vm_fault_t exfat_page_mkwrite(struct vm_fault *vmf)
{
- int err;
struct inode *inode = file_inode(vmf->vma->vm_file);
struct exfat_inode_info *ei = EXFAT_I(inode);
- loff_t new_valid_size;
+ vm_fault_t ret;
+ loff_t new_valid_size, mmap_valid_size;
if (!inode_trylock(inode))
return VM_FAULT_RETRY;
- new_valid_size = ((loff_t)vmf->pgoff + 1) << PAGE_SHIFT;
- new_valid_size = min(new_valid_size, i_size_read(inode));
+ mmap_valid_size = ((loff_t)vmf->pgoff + 1) << PAGE_SHIFT;
+ new_valid_size = min(mmap_valid_size, i_size_read(inode));
if (ei->valid_size < new_valid_size) {
- err = exfat_extend_valid_size(inode, new_valid_size);
- if (err < 0) {
- inode_unlock(inode);
- return vmf_fs_error(err);
+ if (ei->zeroed_size < mmap_valid_size) {
+ int err;
+
+ /*
+ * Only zero the range that hasn't been zeroed yet for
+ * this mmap write path. zeroed_size tracks the largest
+ * page-aligned offset that has already been zeroed.
+ *
+ * This prevents unnecessarily zeroing out the entire
+ * tail page on every page fault when userspace writes
+ * data byte-by-byte through mmap (after a small
+ * fallocate). It fixes data corruption in the tail page
+ * while preserving the existing valid_size semantics.
+ */
+ err = iomap_zero_range(inode, ei->zeroed_size,
+ mmap_valid_size - ei->zeroed_size, NULL,
+ &exfat_write_iomap_ops, NULL, NULL);
+ if (err < 0) {
+ inode_unlock(inode);
+ return vmf_fs_error(err);
+ }
+ ei->zeroed_size = mmap_valid_size;
}
+
+ ei->valid_size = new_valid_size;
+ mark_inode_dirty(inode);
}
+ sb_start_pagefault(inode->i_sb);
+ file_update_time(vmf->vma->vm_file);
+
+ filemap_invalidate_lock_shared(inode->i_mapping);
+ ret = iomap_page_mkwrite(vmf, &exfat_write_iomap_ops, NULL);
+ filemap_invalidate_unlock_shared(inode->i_mapping);
+ sb_end_pagefault(inode->i_sb);
inode_unlock(inode);
- return filemap_page_mkwrite(vmf);
+ return ret;
}
static const struct vm_operations_struct exfat_file_vm_ops = {
@@ -786,6 +820,21 @@ static int exfat_file_mmap_prepare(struct vm_area_desc *desc)
if (unlikely(exfat_forced_shutdown(file_inode(desc->file)->i_sb)))
return -EIO;
+ if (vma_desc_test_all(desc, VMA_SHARED_BIT, VMA_MAYWRITE_BIT)) {
+ struct inode *inode = file_inode(file);
+ loff_t from, to;
+ int err;
+
+ from = ((loff_t)desc->pgoff << PAGE_SHIFT);
+ to = min_t(loff_t, i_size_read(inode),
+ from + vma_desc_size(desc));
+ if (EXFAT_I(inode)->valid_size < to) {
+ err = exfat_extend_valid_size(inode, to);
+ if (err)
+ return err;
+ }
+ }
+
file_accessed(file);
desc->vm_ops = &exfat_file_vm_ops;
return 0;
diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
index 7b09d94ac464..6083ccef9408 100644
--- a/fs/exfat/inode.c
+++ b/fs/exfat/inode.c
@@ -13,9 +13,11 @@
#include <linux/uio.h>
#include <linux/random.h>
#include <linux/iversion.h>
+#include <linux/iomap.h>
#include "exfat_raw.h"
#include "exfat_fs.h"
+#include "iomap.h"
int __exfat_write_inode(struct inode *inode, int sync)
{
@@ -76,15 +78,7 @@ int __exfat_write_inode(struct inode *inode, int sync)
on_disk_size = 0;
ep2->dentry.stream.size = cpu_to_le64(on_disk_size);
- /*
- * mmap write does not use exfat_write_end(), valid_size may be
- * extended to the sector-aligned length in exfat_get_block().
- * So we need to fixup valid_size to the writren length.
- */
- if (on_disk_size < ei->valid_size)
- ep2->dentry.stream.valid_size = ep2->dentry.stream.size;
- else
- ep2->dentry.stream.valid_size = cpu_to_le64(ei->valid_size);
+ ep2->dentry.stream.valid_size = cpu_to_le64(ei->valid_size);
if (on_disk_size) {
ep2->dentry.stream.flags = ei->flags;
@@ -123,7 +117,7 @@ void exfat_sync_inode(struct inode *inode)
* Output: errcode, cluster number
* *clu = (~0), if it's unable to allocate a new cluster
*/
-static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
+int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
unsigned int *clu, unsigned int *count, int create,
bool *balloc)
{
@@ -377,7 +371,13 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
static int exfat_read_folio(struct file *file, struct folio *folio)
{
- return mpage_read_folio(folio, exfat_get_block);
+ struct iomap_read_folio_ctx ctx = {
+ .cur_folio = folio,
+ .ops = &exfat_iomap_bio_read_ops,
+ };
+
+ iomap_read_folio(&exfat_iomap_ops, &ctx, NULL);
+ return 0;
}
static void exfat_readahead(struct readahead_control *rac)
@@ -386,6 +386,10 @@ static void exfat_readahead(struct readahead_control *rac)
struct inode *inode = mapping->host;
struct exfat_inode_info *ei = EXFAT_I(inode);
loff_t pos = readahead_pos(rac);
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &exfat_iomap_bio_read_ops,
+ .rac = rac,
+ };
/* Range cross valid_size, read it page by page. */
if (ei->valid_size < i_size_read(inode) &&
@@ -393,16 +397,22 @@ static void exfat_readahead(struct readahead_control *rac)
ei->valid_size < pos + readahead_length(rac))
return;
- mpage_readahead(rac, exfat_get_block);
+ iomap_readahead(&exfat_iomap_ops, &ctx, NULL);
}
static int exfat_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
+ struct iomap_writepage_ctx wpc = {
+ .inode = mapping->host,
+ .wbc = wbc,
+ .ops = &exfat_writeback_ops,
+ };
+
if (unlikely(exfat_forced_shutdown(mapping->host->i_sb)))
return -EIO;
- return mpage_writepages(mapping, wbc, exfat_get_block);
+ return iomap_writepages(&wpc);
}
static void exfat_write_failed(struct address_space *mapping, loff_t to)
@@ -416,51 +426,6 @@ static void exfat_write_failed(struct address_space *mapping, loff_t to)
}
}
-static int exfat_write_begin(const struct kiocb *iocb,
- struct address_space *mapping,
- loff_t pos, unsigned int len,
- struct folio **foliop, void **fsdata)
-{
- int ret;
-
- if (unlikely(exfat_forced_shutdown(mapping->host->i_sb)))
- return -EIO;
-
- ret = block_write_begin(mapping, pos, len, foliop, exfat_get_block);
-
- if (ret < 0)
- exfat_write_failed(mapping, pos+len);
-
- return ret;
-}
-
-static int exfat_write_end(const struct kiocb *iocb,
- struct address_space *mapping,
- loff_t pos, unsigned int len, unsigned int copied,
- struct folio *folio, void *fsdata)
-{
- struct inode *inode = mapping->host;
- struct exfat_inode_info *ei = EXFAT_I(inode);
- int err;
-
- err = generic_write_end(iocb, mapping, pos, len, copied, folio, fsdata);
- if (err < len)
- exfat_write_failed(mapping, pos+len);
-
- if (!(err < 0) && pos + err > ei->valid_size) {
- ei->valid_size = pos + err;
- mark_inode_dirty(inode);
- }
-
- if (!(err < 0) && !(ei->attr & EXFAT_ATTR_ARCHIVE)) {
- inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
- ei->attr |= EXFAT_ATTR_ARCHIVE;
- mark_inode_dirty(inode);
- }
-
- return err;
-}
-
static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
struct address_space *mapping = iocb->ki_filp->f_mapping;
@@ -510,34 +475,23 @@ static sector_t exfat_aop_bmap(struct address_space *mapping, sector_t block)
/* exfat_get_cluster() assumes the requested blocknr isn't truncated. */
down_read(&EXFAT_I(mapping->host)->truncate_lock);
- blocknr = generic_block_bmap(mapping, block, exfat_get_block);
+ blocknr = iomap_bmap(mapping, block, &exfat_iomap_ops);
up_read(&EXFAT_I(mapping->host)->truncate_lock);
return blocknr;
}
-/*
- * exfat_block_truncate_page() zeroes out a mapping from file offset `from'
- * up to the end of the block which corresponds to `from'.
- * This is required during truncate to physically zeroout the tail end
- * of that block so it doesn't yield old data if the file is later grown.
- * Also, avoid causing failure from fsx for cases of "data past EOF"
- */
-int exfat_block_truncate_page(struct inode *inode, loff_t from)
-{
- return block_truncate_page(inode->i_mapping, from, exfat_get_block);
-}
-
static const struct address_space_operations exfat_aops = {
- .dirty_folio = block_dirty_folio,
- .invalidate_folio = block_invalidate_folio,
- .read_folio = exfat_read_folio,
- .readahead = exfat_readahead,
- .writepages = exfat_writepages,
- .write_begin = exfat_write_begin,
- .write_end = exfat_write_end,
- .direct_IO = exfat_direct_IO,
- .bmap = exfat_aop_bmap,
- .migrate_folio = buffer_migrate_folio,
+ .read_folio = exfat_read_folio,
+ .readahead = exfat_readahead,
+ .writepages = exfat_writepages,
+ .dirty_folio = iomap_dirty_folio,
+ .bmap = exfat_aop_bmap,
+ .migrate_folio = filemap_migrate_folio,
+ .is_partially_uptodate = iomap_is_partially_uptodate,
+ .error_remove_folio = generic_error_remove_folio,
+ .release_folio = iomap_release_folio,
+ .invalidate_folio = iomap_invalidate_folio,
+ .direct_IO = exfat_direct_IO,
};
static inline unsigned long exfat_hash(loff_t i_pos)
@@ -601,6 +555,7 @@ static int exfat_fill_inode(struct inode *inode, struct exfat_dir_entry *info)
ei->flags = info->flags;
ei->type = info->type;
ei->valid_size = info->valid_size;
+ ei->zeroed_size = info->valid_size;
ei->version = 0;
ei->hint_stat.eidx = 0;
diff --git a/fs/exfat/iomap.c b/fs/exfat/iomap.c
new file mode 100644
index 000000000000..0c5aadfd4132
--- /dev/null
+++ b/fs/exfat/iomap.c
@@ -0,0 +1,197 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * iomap callack functions
+ *
+ * Copyright (C) 2026 Namjae Jeon <linkinjeon@kernel.org>
+ */
+
+#include <linux/iomap.h>
+#include <linux/pagemap.h>
+
+#include "exfat_raw.h"
+#include "exfat_fs.h"
+#include "iomap.h"
+
+static int __exfat_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned int flags, struct iomap *iomap, bool may_alloc)
+{
+ struct super_block *sb = inode->i_sb;
+ struct exfat_sb_info *sbi = EXFAT_SB(sb);
+ struct exfat_inode_info *ei = EXFAT_I(inode);
+ unsigned int cluster, num_clusters;
+ loff_t cluster_offset, cluster_length;
+ int err;
+ bool balloc = false;
+
+ if (may_alloc)
+ num_clusters = exfat_bytes_to_cluster_round_up(sbi,
+ offset + length) - exfat_bytes_to_cluster(sbi, offset);
+ else
+ num_clusters = exfat_bytes_to_cluster_round_up(sbi, length);
+
+ mutex_lock(&sbi->s_lock);
+ iomap->bdev = inode->i_sb->s_bdev;
+ iomap->offset = offset;
+
+ err = exfat_map_cluster(inode, exfat_bytes_to_cluster(sbi, offset),
+ &cluster, &num_clusters, may_alloc, &balloc);
+ if (err)
+ goto out;
+
+ cluster_offset = exfat_cluster_offset(sbi, offset);
+ cluster_length = exfat_cluster_to_bytes(sbi, num_clusters);
+ if (length > cluster_length - cluster_offset)
+ iomap->length = cluster_length - cluster_offset;
+ else
+ iomap->length = length;
+
+ iomap->addr = exfat_cluster_to_phys(sbi, cluster) + cluster_offset;
+ iomap->type = IOMAP_MAPPED;
+ if (may_alloc) {
+ if (balloc)
+ iomap->flags = IOMAP_F_NEW;
+ else if (iomap->offset + iomap->length >= ei->valid_size)
+ iomap->flags = IOMAP_F_ZERO_TAIL;
+ } else {
+ if (offset >= ei->valid_size)
+ iomap->type = IOMAP_UNWRITTEN;
+
+ if (iomap->type == IOMAP_MAPPED &&
+ iomap->offset < ei->valid_size &&
+ iomap->offset + iomap->length > ei->valid_size) {
+ iomap->length = round_up(ei->valid_size,
+ 1 << inode->i_blkbits) -
+ iomap->offset;
+ }
+ }
+
+ iomap->flags |= IOMAP_F_MERGED;
+out:
+ mutex_unlock(&sbi->s_lock);
+ return err;
+}
+
+static int exfat_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
+{
+ return __exfat_iomap_begin(inode, offset, length, flags, iomap, false);
+}
+
+static int exfat_write_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
+{
+ return __exfat_iomap_begin(inode, offset, length, flags, iomap, true);
+}
+
+const struct iomap_ops exfat_iomap_ops = {
+ .iomap_begin = exfat_iomap_begin,
+};
+
+/*
+ * exfat_write_iomap_end - Update the state after write
+ *
+ * Extends ->valid_size to cover the newly written range.
+ * Marks the inode dirty if metadata was changed.
+ */
+static int exfat_write_iomap_end(struct inode *inode, loff_t pos, loff_t length,
+ ssize_t written, unsigned int flags, struct iomap *iomap)
+{
+ if (written) {
+ struct exfat_sb_info *sbi = EXFAT_SB(inode->i_sb);
+ struct exfat_inode_info *ei = EXFAT_I(inode);
+ bool dirtied = false;
+ loff_t end = pos + written;
+
+ mutex_lock(&sbi->s_lock);
+ if (ei->valid_size < end) {
+ ei->valid_size = end;
+ if (ei->zeroed_size < end)
+ ei->zeroed_size = end;
+ dirtied = true;
+ }
+ mutex_unlock(&sbi->s_lock);
+
+ if (dirtied || iomap->flags & IOMAP_F_SIZE_CHANGED)
+ mark_inode_dirty(inode);
+ }
+
+ return written;
+}
+
+const struct iomap_ops exfat_write_iomap_ops = {
+ .iomap_begin = exfat_write_iomap_begin,
+ .iomap_end = exfat_write_iomap_end,
+};
+
+/*
+ * exfat_writeback_range - Map folio during writeback
+ *
+ * Called for each folio during writeback. If the folio falls outside the
+ * current iomap, remaps by calling read_iomap_begin.
+ */
+static ssize_t exfat_writeback_range(struct iomap_writepage_ctx *wpc,
+ struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
+{
+ if (offset < wpc->iomap.offset ||
+ offset >= wpc->iomap.offset + wpc->iomap.length) {
+ int error;
+
+ error = __exfat_iomap_begin(wpc->inode, offset, len,
+ 0, &wpc->iomap, false);
+ if (error)
+ return error;
+ }
+
+ return iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
+}
+
+const struct iomap_writeback_ops exfat_writeback_ops = {
+ .writeback_range = exfat_writeback_range,
+ .writeback_submit = iomap_ioend_writeback_submit,
+};
+
+/**
+ * exfat_iomap_read_end_io - iomap read bio completion handler for exFAT
+ * @bio: bio that has completed reading
+ *
+ * exfat_iomap_begin() rounds up MAPPED extents to the block boundary of
+ * valid_size. This ensures that any subsequent blocks are treated as
+ * IOMAP_UNWRITTEN, but it also causes the "straddle block" containing
+ * valid_size to be read from disk. The disk data beyond valid_size in
+ * this block is stale and must be zeroed to prevent data leakage.
+ */
+static void exfat_iomap_read_end_io(struct bio *bio)
+{
+ int error = blk_status_to_errno(bio->bi_status);
+ struct folio_iter iter;
+
+ bio_for_each_folio_all(iter, bio) {
+ struct folio *folio = iter.folio;
+ struct exfat_inode_info *ei = EXFAT_I(folio->mapping->host);
+ s64 valid_size;
+ loff_t pos = folio_pos(folio);
+
+ valid_size = ei->valid_size;
+ if (pos + iter.offset < valid_size &&
+ pos + iter.offset + iter.length > valid_size)
+ folio_zero_segment(folio, offset_in_folio(folio, valid_size),
+ iter.offset + iter.length);
+
+ iomap_finish_folio_read(folio, iter.offset, iter.length, error);
+ }
+ bio_put(bio);
+}
+
+static void exfat_iomap_bio_submit_read(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx)
+{
+ struct bio *bio = ctx->read_ctx;
+
+ bio->bi_end_io = exfat_iomap_read_end_io;
+ submit_bio(bio);
+}
+
+const struct iomap_read_ops exfat_iomap_bio_read_ops = {
+ .read_folio_range = iomap_bio_read_folio_range,
+ .submit_read = exfat_iomap_bio_submit_read,
+};
diff --git a/fs/exfat/iomap.h b/fs/exfat/iomap.h
new file mode 100644
index 000000000000..7f8dcbe20a17
--- /dev/null
+++ b/fs/exfat/iomap.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2026 Namjae Jeon <linkinjeon@kernel.org>
+ */
+
+#ifndef _LINUX_EXFAT_IOMAP_H
+#define _LINUX_EXFAT_IOMAP_H
+
+extern const struct iomap_ops exfat_iomap_ops;
+extern const struct iomap_ops exfat_write_iomap_ops;
+extern const struct iomap_writeback_ops exfat_writeback_ops;
+extern const struct iomap_read_ops exfat_iomap_bio_read_ops;
+
+#endif /* _LINUX_EXFAT_IOMAP_H */
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 8/9] exfat: add iomap direct I/O support
2026-05-07 12:42 [PATCH v2 0/9] exfat: convert to iomap Namjae Jeon
` (6 preceding siblings ...)
2026-05-07 12:42 ` [PATCH v2 7/9] exfat: add iomap buffered I/O support Namjae Jeon
@ 2026-05-07 12:42 ` Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 9/9] exfat: add support for SEEK_HOLE and SEEK_DATA in llseek Namjae Jeon
8 siblings, 0 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 12:42 UTC (permalink / raw)
To: sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel,
Namjae Jeon
Add iomap-based direct I/O support to the exfat filesystem. This replaces
the previous exfat_direct_IO() implementation that used
blockdev_direct_IO() with iomap_dio_rw() interface.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/exfat/Kconfig | 1 -
fs/exfat/exfat_fs.h | 1 -
fs/exfat/file.c | 76 +++++++++++++----
fs/exfat/inode.c | 200 --------------------------------------------
fs/exfat/iomap.c | 26 ++++++
fs/exfat/iomap.h | 1 +
6 files changed, 89 insertions(+), 216 deletions(-)
diff --git a/fs/exfat/Kconfig b/fs/exfat/Kconfig
index e0b200902253..1fcb10c8d7bc 100644
--- a/fs/exfat/Kconfig
+++ b/fs/exfat/Kconfig
@@ -4,7 +4,6 @@ config EXFAT_FS
tristate "exFAT filesystem support"
select BUFFER_HEAD
select NLS
- select LEGACY_DIRECT_IO
select FS_IOMAP
help
This allows you to mount devices formatted with the exFAT file system.
diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 448857d4b70f..6f3ad1586261 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -557,7 +557,6 @@ int exfat_trim_fs(struct inode *inode, struct fstrim_range *range);
/* file.c */
extern const struct file_operations exfat_file_operations;
int __exfat_truncate(struct inode *inode);
-void exfat_truncate(struct inode *inode);
int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
struct iattr *attr);
int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 6033e8ae4628..c4e6afc21bfe 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -292,7 +292,7 @@ int __exfat_truncate(struct inode *inode)
return 0;
}
-void exfat_truncate(struct inode *inode)
+static int exfat_truncate(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
@@ -315,6 +315,8 @@ void exfat_truncate(struct inode *inode)
inode->i_blocks = round_up(i_size_read(inode), sbi->cluster_size) >> 9;
write_size:
mutex_unlock(&sbi->s_lock);
+
+ return err;
}
int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
@@ -400,7 +402,7 @@ int exfat_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
* __exfat_write_inode() is called from exfat_truncate(), inode
* is already written by it, so mark_inode_dirty() is unneeded.
*/
- exfat_truncate(inode);
+ error = exfat_truncate(inode);
up_write(&EXFAT_I(inode)->truncate_lock);
} else
mark_inode_dirty(inode);
@@ -664,6 +666,47 @@ static int exfat_extend_valid_size(struct inode *inode, loff_t new_valid_size)
return ret;
}
+static ssize_t exfat_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+ ssize_t ret;
+
+ ret = iomap_dio_rw(iocb, from, &exfat_write_iomap_ops,
+ &exfat_write_dio_ops, 0, NULL, 0);
+ if (ret == -ENOTBLK)
+ ret = 0;
+ else if (ret < 0)
+ goto out;
+
+ if (iov_iter_count(from)) {
+ loff_t offset, end;
+ ssize_t written;
+ int ret2;
+
+ offset = iocb->ki_pos;
+ iocb->ki_flags &= ~IOCB_DIRECT;
+ written = iomap_file_buffered_write(iocb, from,
+ &exfat_write_iomap_ops, NULL, NULL);
+ if (written < 0) {
+ ret = written;
+ goto out;
+ }
+
+ ret += written;
+ end = iocb->ki_pos + written - 1;
+ ret2 = filemap_write_and_wait_range(iocb->ki_filp->f_mapping,
+ offset, end);
+ if (ret2) {
+ ret = -EIO;
+ goto out;
+ }
+ invalidate_mapping_pages(iocb->ki_filp->f_mapping,
+ offset >> PAGE_SHIFT,
+ end >> PAGE_SHIFT);
+ }
+out:
+ return ret;
+}
+
static ssize_t exfat_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
{
ssize_t ret;
@@ -688,16 +731,6 @@ static ssize_t exfat_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
if (ret <= 0)
goto unlock;
- if (iocb->ki_flags & IOCB_DIRECT) {
- unsigned long align = pos | iov_iter_alignment(iter);
-
- if (!IS_ALIGNED(align, i_blocksize(inode)) &&
- !IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev))) {
- ret = -EINVAL;
- goto unlock;
- }
- }
-
err = file_modified(iocb->ki_filp);
if (err) {
ret = err;
@@ -716,7 +749,7 @@ static ssize_t exfat_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
}
if (iocb->ki_flags & IOCB_DIRECT)
- ret = __generic_file_write_iter(iocb, iter);
+ ret = exfat_dio_write_iter(iocb, iter);
else
ret = iomap_file_buffered_write(iocb, iter,
&exfat_write_iomap_ops, NULL, NULL);
@@ -746,11 +779,24 @@ static ssize_t exfat_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
static ssize_t exfat_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
{
struct inode *inode = file_inode(iocb->ki_filp);
+ ssize_t ret;
if (unlikely(exfat_forced_shutdown(inode->i_sb)))
return -EIO;
- return generic_file_read_iter(iocb, iter);
+ inode_lock_shared(inode);
+
+ if (iocb->ki_flags & IOCB_DIRECT) {
+ file_accessed(iocb->ki_filp);
+ ret = iomap_dio_rw(iocb, iter, &exfat_iomap_ops, NULL, 0,
+ NULL, 0);
+ } else {
+ ret = generic_file_read_iter(iocb, iter);
+ }
+
+ inode_unlock_shared(inode);
+
+ return ret;
}
static vm_fault_t exfat_page_mkwrite(struct vm_fault *vmf)
@@ -860,6 +906,8 @@ static int exfat_file_open(struct inode *inode, struct file *filp)
if (err)
return err;
+ filp->f_mode |= FMODE_CAN_ODIRECT;
+
return 0;
}
diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
index 6083ccef9408..e58561d65294 100644
--- a/fs/exfat/inode.c
+++ b/fs/exfat/inode.c
@@ -224,151 +224,6 @@ int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
return 0;
}
-static int exfat_get_block(struct inode *inode, sector_t iblock,
- struct buffer_head *bh_result, int create)
-{
- struct exfat_inode_info *ei = EXFAT_I(inode);
- struct super_block *sb = inode->i_sb;
- struct exfat_sb_info *sbi = EXFAT_SB(sb);
- unsigned long max_blocks = bh_result->b_size >> inode->i_blkbits;
- int err = 0;
- unsigned long mapped_blocks = 0;
- unsigned int cluster, sec_offset, count;
- sector_t last_block;
- sector_t phys = 0;
- sector_t valid_blks;
- loff_t i_size;
-
- mutex_lock(&sbi->s_lock);
- i_size = i_size_read(inode);
- last_block = exfat_bytes_to_block_round_up(sb, i_size);
- if (iblock >= last_block && !create)
- goto done;
-
- /* Is this block already allocated? */
- count = exfat_bytes_to_cluster_round_up(sbi, bh_result->b_size);
- err = exfat_map_cluster(inode, iblock >> sbi->sect_per_clus_bits,
- &cluster, &count, create, NULL);
- if (err) {
- if (err != -ENOSPC)
- exfat_fs_error_ratelimit(sb,
- "failed to bmap (inode : %p iblock : %llu, err : %d)",
- inode, (unsigned long long)iblock, err);
- goto unlock_ret;
- }
-
- if (cluster == EXFAT_EOF_CLUSTER)
- goto done;
-
- /* sector offset in cluster */
- sec_offset = iblock & (sbi->sect_per_clus - 1);
-
- phys = exfat_cluster_to_sector(sbi, cluster) + sec_offset;
- mapped_blocks = ((unsigned long)count << sbi->sect_per_clus_bits) - sec_offset;
- max_blocks = min(mapped_blocks, max_blocks);
-
- map_bh(bh_result, sb, phys);
- if (buffer_delay(bh_result))
- clear_buffer_delay(bh_result);
-
- /*
- * In most cases, we just need to set bh_result to mapped, unmapped
- * or new status as follows:
- * 1. i_size == valid_size
- * 2. write case (create == 1)
- * 3. direct_read (!bh_result->b_folio)
- * -> the unwritten part will be zeroed in exfat_direct_IO()
- *
- * Otherwise, in the case of buffered read, it is necessary to take
- * care the last nested block if valid_size is not equal to i_size.
- */
- if (i_size == ei->valid_size || create || !bh_result->b_folio)
- valid_blks = exfat_bytes_to_block_round_up(sb, ei->valid_size);
- else
- valid_blks = exfat_bytes_to_block(sb, ei->valid_size);
-
- /* The range has been fully written, map it */
- if (iblock + max_blocks < valid_blks)
- goto done;
-
- /* The range has been partially written, map the written part */
- if (iblock < valid_blks) {
- max_blocks = valid_blks - iblock;
- goto done;
- }
-
- /* The area has not been written, map and mark as new for create case */
- if (create) {
- set_buffer_new(bh_result);
- ei->valid_size = exfat_block_to_bytes(sb, iblock + max_blocks);
- mark_inode_dirty(inode);
- goto done;
- }
-
- /*
- * The area has just one block partially written.
- * In that case, we should read and fill the unwritten part of
- * a block with zero.
- */
- if (bh_result->b_folio && iblock == valid_blks &&
- (ei->valid_size & (sb->s_blocksize - 1))) {
- loff_t size, pos;
- void *addr;
-
- max_blocks = 1;
-
- /*
- * No buffer_head is allocated.
- * (1) bmap: It's enough to set blocknr without I/O.
- * (2) read: The unwritten part should be filled with zero.
- * If a folio does not have any buffers,
- * let's returns -EAGAIN to fallback to
- * block_read_full_folio() for per-bh IO.
- */
- if (!folio_buffers(bh_result->b_folio)) {
- err = -EAGAIN;
- goto done;
- }
-
- pos = exfat_block_to_bytes(sb, iblock);
- size = ei->valid_size - pos;
- addr = folio_address(bh_result->b_folio) +
- offset_in_folio(bh_result->b_folio, pos);
-
- /* Check if bh->b_data points to proper addr in folio */
- if (bh_result->b_data != addr) {
- exfat_fs_error_ratelimit(sb,
- "b_data(%p) != folio_addr(%p)",
- bh_result->b_data, addr);
- err = -EINVAL;
- goto done;
- }
-
- /* Read a block */
- err = bh_read(bh_result, 0);
- if (err < 0)
- goto done;
-
- /* Zero unwritten part of a block */
- memset(bh_result->b_data + size, 0, bh_result->b_size - size);
- err = 0;
- goto done;
- }
-
- /*
- * The area has not been written, clear mapped for read/bmap cases.
- * If so, it will be filled with zero without reading from disk.
- */
- clear_buffer_mapped(bh_result);
-done:
- bh_result->b_size = exfat_block_to_bytes(sb, max_blocks);
- if (err < 0)
- clear_buffer_mapped(bh_result);
-unlock_ret:
- mutex_unlock(&sbi->s_lock);
- return err;
-}
-
static int exfat_read_folio(struct file *file, struct folio *folio)
{
struct iomap_read_folio_ctx ctx = {
@@ -415,60 +270,6 @@ static int exfat_writepages(struct address_space *mapping,
return iomap_writepages(&wpc);
}
-static void exfat_write_failed(struct address_space *mapping, loff_t to)
-{
- struct inode *inode = mapping->host;
-
- if (to > i_size_read(inode)) {
- truncate_pagecache(inode, i_size_read(inode));
- inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
- exfat_truncate(inode);
- }
-}
-
-static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
- struct address_space *mapping = iocb->ki_filp->f_mapping;
- struct inode *inode = mapping->host;
- struct exfat_inode_info *ei = EXFAT_I(inode);
- loff_t pos = iocb->ki_pos;
- loff_t size = pos + iov_iter_count(iter);
- int rw = iov_iter_rw(iter);
- ssize_t ret;
-
- /*
- * Need to use the DIO_LOCKING for avoiding the race
- * condition of exfat_get_block() and ->truncate().
- */
- ret = blockdev_direct_IO(iocb, inode, iter, exfat_get_block);
- if (ret < 0) {
- if (rw == WRITE && ret != -EIOCBQUEUED)
- exfat_write_failed(mapping, size);
-
- return ret;
- }
-
- size = pos + ret;
-
- if (rw == WRITE) {
- /*
- * If the block had been partially written before this write,
- * ->valid_size will not be updated in exfat_get_block(),
- * update it here.
- */
- if (ei->valid_size < size) {
- ei->valid_size = size;
- mark_inode_dirty(inode);
- }
- } else if (pos < ei->valid_size && ei->valid_size < size) {
- /* zero the unwritten part in the partially written block */
- iov_iter_revert(iter, size - ei->valid_size);
- iov_iter_zero(size - ei->valid_size, iter);
- }
-
- return ret;
-}
-
static sector_t exfat_aop_bmap(struct address_space *mapping, sector_t block)
{
sector_t blocknr;
@@ -491,7 +292,6 @@ static const struct address_space_operations exfat_aops = {
.error_remove_folio = generic_error_remove_folio,
.release_folio = iomap_release_folio,
.invalidate_folio = iomap_invalidate_folio,
- .direct_IO = exfat_direct_IO,
};
static inline unsigned long exfat_hash(loff_t i_pos)
diff --git a/fs/exfat/iomap.c b/fs/exfat/iomap.c
index 0c5aadfd4132..69308d66c55a 100644
--- a/fs/exfat/iomap.c
+++ b/fs/exfat/iomap.c
@@ -12,6 +12,32 @@
#include "exfat_fs.h"
#include "iomap.h"
+/*
+ * exfat_file_write_dio_end_io - Direct I/O write completion handler
+ *
+ * Updates i_size if the write extended the file. Called from the dio layer
+ * after I/O completion.
+ */
+static int exfat_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
+ int error, unsigned int flags)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+
+ if (error)
+ return error;
+
+ if (size && i_size_read(inode) < iocb->ki_pos + size) {
+ i_size_write(inode, iocb->ki_pos + size);
+ mark_inode_dirty(inode);
+ }
+
+ return 0;
+}
+
+const struct iomap_dio_ops exfat_write_dio_ops = {
+ .end_io = exfat_file_write_dio_end_io,
+};
+
static int __exfat_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
unsigned int flags, struct iomap *iomap, bool may_alloc)
{
diff --git a/fs/exfat/iomap.h b/fs/exfat/iomap.h
index 7f8dcbe20a17..830388f386f4 100644
--- a/fs/exfat/iomap.h
+++ b/fs/exfat/iomap.h
@@ -6,6 +6,7 @@
#ifndef _LINUX_EXFAT_IOMAP_H
#define _LINUX_EXFAT_IOMAP_H
+extern const struct iomap_dio_ops exfat_write_dio_ops;
extern const struct iomap_ops exfat_iomap_ops;
extern const struct iomap_ops exfat_write_iomap_ops;
extern const struct iomap_writeback_ops exfat_writeback_ops;
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 9/9] exfat: add support for SEEK_HOLE and SEEK_DATA in llseek
2026-05-07 12:42 [PATCH v2 0/9] exfat: convert to iomap Namjae Jeon
` (7 preceding siblings ...)
2026-05-07 12:42 ` [PATCH v2 8/9] exfat: add iomap direct " Namjae Jeon
@ 2026-05-07 12:42 ` Namjae Jeon
8 siblings, 0 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 12:42 UTC (permalink / raw)
To: sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel,
Namjae Jeon
Adds exfat_file_llseek() that implements these whence values via
the iomap layer (iomap_seek_hole() and iomap_seek_data()) using the
existing exfat_read_iomap_ops.
Unlike many other modern filesystems, exFAT does not support sparse files
with unallocated clusters (holes). In exFAT, clusters are always fully
allocated once they are written or preallocated. In addition, exFAT
maintains a separate "Valid Data Length" (valid_size) that is distinct
from the logical file size. This affects how holes are reported during
seeking. In exfat_iomap_begin(), ranges where the offset is greater
than or equal to ei->valid_size are mapped as IOMAP_UNWRITTEN, while ranges
below valid_size are mapped as IOMAP_MAPPED. This mapping behavior is used
by the iomap seek functions to correctly report SEEK_HOLE and SEEK_DATA
positions.
- Ranges with offset >= ei->valid_size are mapped as IOMAP_HOLE.
- Ranges with offset < ei->valid_size are mapped as IOMAP_MAPPED.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
fs/exfat/file.c | 25 ++++++++++++++++++++++++-
fs/exfat/iomap.c | 32 ++++++++++++++++++++++++++++----
2 files changed, 52 insertions(+), 5 deletions(-)
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index c4e6afc21bfe..c1567d206253 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -911,9 +911,32 @@ static int exfat_file_open(struct inode *inode, struct file *filp)
return 0;
}
+static loff_t exfat_file_llseek(struct file *file, loff_t offset, int whence)
+{
+ struct inode *inode = file->f_mapping->host;
+
+ switch (whence) {
+ case SEEK_HOLE:
+ inode_lock_shared(inode);
+ offset = iomap_seek_hole(inode, offset, &exfat_iomap_ops);
+ inode_unlock_shared(inode);
+ break;
+ case SEEK_DATA:
+ inode_lock_shared(inode);
+ offset = iomap_seek_data(inode, offset, &exfat_iomap_ops);
+ inode_unlock_shared(inode);
+ break;
+ default:
+ return generic_file_llseek(file, offset, whence);
+ }
+ if (offset < 0)
+ return offset;
+ return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
+}
+
const struct file_operations exfat_file_operations = {
.open = exfat_file_open,
- .llseek = generic_file_llseek,
+ .llseek = exfat_file_llseek,
.read_iter = exfat_file_read_iter,
.write_iter = exfat_file_write_iter,
.unlocked_ioctl = exfat_ioctl,
diff --git a/fs/exfat/iomap.c b/fs/exfat/iomap.c
index 69308d66c55a..9b1499a30d39 100644
--- a/fs/exfat/iomap.c
+++ b/fs/exfat/iomap.c
@@ -79,15 +79,39 @@ static int __exfat_iomap_begin(struct inode *inode, loff_t offset, loff_t length
else if (iomap->offset + iomap->length >= ei->valid_size)
iomap->flags = IOMAP_F_ZERO_TAIL;
} else {
+ /*
+ * valid_size is tracked in byte granularity and
+ * marks the exact boundary between valid data and
+ * holes (or unwritten space).
+ *
+ * When IOMAP_REPORT is set (used by lseek(SEEK_HOLE)
+ * and SEEK_DATA), we return IOMAP_HOLE. This allows
+ * iomap_seek_hole_iter() to directly return the
+ * precise byte position.
+ *
+ * For normal I/O paths (without IOMAP_REPORT) we
+ * return IOMAP_UNWRITTEN so the write path can
+ * distinguish it from a real hole.
+ */
if (offset >= ei->valid_size)
- iomap->type = IOMAP_UNWRITTEN;
+ iomap->type = flags & IOMAP_REPORT ?
+ IOMAP_HOLE : IOMAP_UNWRITTEN;
if (iomap->type == IOMAP_MAPPED &&
iomap->offset < ei->valid_size &&
iomap->offset + iomap->length > ei->valid_size) {
- iomap->length = round_up(ei->valid_size,
- 1 << inode->i_blkbits) -
- iomap->offset;
+ if (flags & IOMAP_REPORT) {
+ /*
+ * For SEEK_HOLE/SEEK_DATA, clip the length
+ * to the exact byte boundary (valid_size).
+ * This ensures the caller gets the precise
+ * hole position in byte units.
+ */
+ iomap->length = ei->valid_size - iomap->offset;
+ } else
+ iomap->length = round_up(ei->valid_size,
+ 1 << inode->i_blkbits) -
+ iomap->offset;
}
}
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2 1/9] exfat: replace unsafe macros with static inline functions
2026-05-07 12:42 ` [PATCH v2 1/9] exfat: replace unsafe macros with static inline functions Namjae Jeon
@ 2026-05-07 13:41 ` CharSyam
2026-05-07 23:36 ` Namjae Jeon
0 siblings, 1 reply; 23+ messages in thread
From: CharSyam @ 2026-05-07 13:41 UTC (permalink / raw)
To: Namjae Jeon
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, hch, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
Hi. Namjae.
in fs/exfat/inode.c
- inode->i_blocks += EXFAT_CLU_TO_B(num_to_be_allocated,
sbi) >> 9;
+ inode->i_blocks +=
+ exfat_cluster_to_sectors(sbi, num_to_be_allocated) >> 9;
exfat_cluster_to_sectors() already returns a 512-byte block count, so
the extra >> 9 makes i_blocks 512 times too small.
2026년 5월 7일 (목) 오후 9:44, Namjae Jeon <linkinjeon@kernel.org>님이 작성:
>
> The current exFAT driver relies on various macros for unit conversions
> between clusters, blocks, sectors, and directory entries. These macros
> are structurally unsafe as they lack type enforcement and are prone to
> potential integer overflows during bit-shift operations, especially
> on 64-bit architectures. Replace all arithmetic macros with static inline
> functions to provide strict type checking and explicit casting.
>
> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
> ---
> fs/exfat/balloc.c | 2 +-
> fs/exfat/dir.c | 48 ++++++++---------
> fs/exfat/exfat_fs.h | 122 ++++++++++++++++++++++++++++++++------------
> fs/exfat/fatent.c | 4 +-
> fs/exfat/file.c | 8 +--
> fs/exfat/inode.c | 19 +++----
> fs/exfat/namei.c | 26 +++++-----
> fs/exfat/super.c | 4 +-
> 8 files changed, 147 insertions(+), 86 deletions(-)
>
> diff --git a/fs/exfat/balloc.c b/fs/exfat/balloc.c
> index 625f2f14d4fe..e66ebf899778 100644
> --- a/fs/exfat/balloc.c
> +++ b/fs/exfat/balloc.c
> @@ -112,7 +112,7 @@ static int exfat_allocate_bitmap(struct super_block *sb,
> }
>
> if (exfat_test_bitmap_range(sb, sbi->map_clu,
> - EXFAT_B_TO_CLU_ROUND_UP(map_size, sbi)) == false)
> + exfat_bytes_to_cluster_round_up(sbi, map_size)) == false)
> goto err_out;
>
> return 0;
> diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
> index ac008ccaa97d..ca9d707220df 100644
> --- a/fs/exfat/dir.c
> +++ b/fs/exfat/dir.c
> @@ -76,7 +76,7 @@ static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_ent
> struct super_block *sb = inode->i_sb;
> struct exfat_sb_info *sbi = EXFAT_SB(sb);
> struct exfat_inode_info *ei = EXFAT_I(inode);
> - unsigned int dentry = EXFAT_B_TO_DEN(*cpos) & 0xFFFFFFFF;
> + unsigned int dentry = exfat_bytes_to_dentries(*cpos) & 0xFFFFFFFF;
> struct buffer_head *bh;
>
> /* check if the given file ID is opened */
> @@ -84,13 +84,13 @@ static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_ent
> return -EPERM;
>
> exfat_chain_set(&dir, ei->start_clu,
> - EXFAT_B_TO_CLU(i_size_read(inode), sbi), ei->flags);
> + exfat_bytes_to_cluster(sbi, i_size_read(inode)), ei->flags);
>
> dentries_per_clu = sbi->dentries_per_clu;
> - max_dentries = (unsigned int)min_t(u64, MAX_EXFAT_DENTRIES,
> - (u64)EXFAT_CLU_TO_DEN(sbi->num_clusters, sbi));
> + max_dentries = min(MAX_EXFAT_DENTRIES,
> + exfat_cluster_to_dentries(sbi, sbi->num_clusters));
>
> - clu_offset = EXFAT_DEN_TO_CLU(dentry, sbi);
> + clu_offset = exfat_dentries_to_cluster(sbi, dentry);
> exfat_chain_dup(&clu, &dir);
>
> if (clu.flags == ALLOC_FAT_CHAIN) {
> @@ -147,10 +147,10 @@ static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_ent
> dir_entry->dir = clu;
> brelse(bh);
>
> - ei->hint_bmap.off = EXFAT_DEN_TO_CLU(dentry, sbi);
> + ei->hint_bmap.off = exfat_dentries_to_cluster(sbi, dentry);
> ei->hint_bmap.clu = clu.dir;
>
> - *cpos = EXFAT_DEN_TO_B(dentry + 1 + num_ext);
> + *cpos = exfat_dentries_to_bytes(dentry + 1 + num_ext);
> return 0;
> }
>
> @@ -160,7 +160,7 @@ static int exfat_readdir(struct inode *inode, loff_t *cpos, struct exfat_dir_ent
>
> out:
> dir_entry->namebuf.lfn[0] = '\0';
> - *cpos = EXFAT_DEN_TO_B(dentry);
> + *cpos = exfat_dentries_to_bytes(dentry);
> return 0;
> }
>
> @@ -465,7 +465,7 @@ static void exfat_free_benign_secondary_clusters(struct inode *inode,
> return;
>
> exfat_chain_set(&dir, start_clu,
> - EXFAT_B_TO_CLU_ROUND_UP(size, EXFAT_SB(sb)),
> + exfat_bytes_to_cluster_round_up(EXFAT_SB(sb), size),
> flags);
> exfat_free_cluster(inode, &dir);
> }
> @@ -556,10 +556,11 @@ static int exfat_find_location(struct super_block *sb, struct exfat_chain *p_dir
> unsigned int off, clu = 0;
> struct exfat_sb_info *sbi = EXFAT_SB(sb);
>
> - off = EXFAT_DEN_TO_B(entry);
> + off = exfat_dentries_to_bytes(entry);
>
> clu = p_dir->dir;
> - ret = exfat_cluster_walk(sb, &clu, EXFAT_B_TO_CLU(off, sbi), p_dir->flags);
> + ret = exfat_cluster_walk(sb, &clu, exfat_bytes_to_cluster(sbi, off),
> + p_dir->flags);
> if (ret)
> return ret;
>
> @@ -567,7 +568,7 @@ static int exfat_find_location(struct super_block *sb, struct exfat_chain *p_dir
> exfat_fs_error(sb,
> "unexpected early break in cluster chain (clu : %u, len : %d)",
> p_dir->dir,
> - EXFAT_B_TO_CLU(off, sbi));
> + exfat_bytes_to_cluster(sbi, off));
> return -EIO;
> }
>
> @@ -577,13 +578,13 @@ static int exfat_find_location(struct super_block *sb, struct exfat_chain *p_dir
> }
>
> /* byte offset in cluster */
> - off = EXFAT_CLU_OFFSET(off, sbi);
> + off = exfat_cluster_offset(sbi, off);
>
> /* byte offset in sector */
> - *offset = EXFAT_BLK_OFFSET(off, sb);
> + *offset = exfat_block_offset(sb, off);
>
> /* sector offset in cluster */
> - *sector = EXFAT_B_TO_BLK(off, sb);
> + *sector = exfat_bytes_to_block(sb, off);
> *sector += exfat_cluster_to_sector(sbi, clu);
> return 0;
> }
> @@ -593,7 +594,7 @@ struct exfat_dentry *exfat_get_dentry(struct super_block *sb,
> {
> struct exfat_sb_info *sbi = EXFAT_SB(sb);
> unsigned int sect_per_clus = sbi->sect_per_clus;
> - unsigned int dentries_per_page = EXFAT_B_TO_DEN(PAGE_SIZE);
> + unsigned int dentries_per_page = exfat_bytes_to_dentries(PAGE_SIZE);
> int off;
> sector_t sec;
>
> @@ -672,8 +673,8 @@ struct exfat_dentry *exfat_get_dentry_cached(
> struct exfat_entry_set_cache *es, int num)
> {
> int off = es->start_off + num * DENTRY_SIZE;
> - struct buffer_head *bh = es->bh[EXFAT_B_TO_BLK(off, es->sb)];
> - char *p = bh->b_data + EXFAT_BLK_OFFSET(off, es->sb);
> + struct buffer_head *bh = es->bh[exfat_bytes_to_block(es->sb, off)];
> + char *p = bh->b_data + exfat_block_offset(es->sb, off);
>
> return (struct exfat_dentry *)p;
> }
> @@ -741,7 +742,7 @@ static int __exfat_get_dentry_set(struct exfat_entry_set_cache *es,
>
> es->num_entries = num_entries;
>
> - num_bh = EXFAT_B_TO_BLK_ROUND_UP(off + num_entries * DENTRY_SIZE, sb);
> + num_bh = exfat_bytes_to_block_round_up(sb, off + num_entries * DENTRY_SIZE);
> if (num_bh > ARRAY_SIZE(es->__bh)) {
> es->bh = kmalloc_objs(*es->bh, num_bh, GFP_NOFS);
> if (!es->bh) {
> @@ -830,7 +831,7 @@ static int exfat_validate_empty_dentry_set(struct exfat_entry_set_cache *es)
>
> err_used_follow_unused:
> off = es->start_off + (i << DENTRY_SIZE_BITS);
> - bh = es->bh[EXFAT_B_TO_BLK(off, es->sb)];
> + bh = es->bh[exfat_bytes_to_block(es->sb, off)];
>
> exfat_fs_error(es->sb,
> "in sector %lld, dentry %d should be unused, but 0x%x",
> @@ -839,7 +840,8 @@ static int exfat_validate_empty_dentry_set(struct exfat_entry_set_cache *es)
> return -EIO;
>
> count_skip_entries:
> - es->num_entries = EXFAT_B_TO_DEN(EXFAT_BLK_TO_B(es->num_bh, es->sb) - es->start_off);
> + es->num_entries =
> + exfat_bytes_to_dentries(exfat_block_to_bytes(es->sb, es->num_bh) - es->start_off);
> for (; i < es->num_entries; i++) {
> ep = exfat_get_dentry_cached(es, i);
> if (IS_EXFAT_DELETED(ep->type))
> @@ -892,7 +894,7 @@ static inline void exfat_set_empty_hint(struct exfat_inode_info *ei,
> {
> if (ei->hint_femp.eidx == EXFAT_HINT_NONE ||
> ei->hint_femp.eidx > dentry) {
> - int total_entries = EXFAT_B_TO_DEN(i_size_read(&ei->vfs_inode));
> + int total_entries = exfat_bytes_to_dentries(i_size_read(&ei->vfs_inode));
>
> if (candi_empty->count == 0) {
> candi_empty->cur = *clu;
> @@ -1215,7 +1217,7 @@ static int exfat_get_volume_label_dentry(struct super_block *sb,
> es->bh = es->__bh;
> es->bh[0] = bh;
> es->num_bh = 1;
> - es->start_off = EXFAT_DEN_TO_B(i) % sb->s_blocksize;
> + es->start_off = exfat_dentries_to_bytes(i) % sb->s_blocksize;
>
> return 0;
> }
> diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
> index 89ef5368277f..9c8ab3df7a42 100644
> --- a/fs/exfat/exfat_fs.h
> +++ b/fs/exfat/exfat_fs.h
> @@ -84,38 +84,6 @@ enum {
> (min_t(blkcnt_t, (sb)->s_bdi->ra_pages, (sb)->s_bdi->io_pages) \
> << (PAGE_SHIFT - (sb)->s_blocksize_bits))
>
> -/*
> - * helpers for cluster size to byte conversion.
> - */
> -#define EXFAT_CLU_TO_B(b, sbi) ((b) << (sbi)->cluster_size_bits)
> -#define EXFAT_B_TO_CLU(b, sbi) ((b) >> (sbi)->cluster_size_bits)
> -#define EXFAT_B_TO_CLU_ROUND_UP(b, sbi) \
> - (((b - 1) >> (sbi)->cluster_size_bits) + 1)
> -#define EXFAT_CLU_OFFSET(off, sbi) ((off) & ((sbi)->cluster_size - 1))
> -
> -/*
> - * helpers for block size to byte conversion.
> - */
> -#define EXFAT_BLK_TO_B(b, sb) ((b) << (sb)->s_blocksize_bits)
> -#define EXFAT_B_TO_BLK(b, sb) ((b) >> (sb)->s_blocksize_bits)
> -#define EXFAT_B_TO_BLK_ROUND_UP(b, sb) \
> - (((b - 1) >> (sb)->s_blocksize_bits) + 1)
> -#define EXFAT_BLK_OFFSET(off, sb) ((off) & ((sb)->s_blocksize - 1))
> -
> -/*
> - * helpers for block size to dentry size conversion.
> - */
> -#define EXFAT_B_TO_DEN(b) ((b) >> DENTRY_SIZE_BITS)
> -#define EXFAT_DEN_TO_B(b) ((b) << DENTRY_SIZE_BITS)
> -
> -/*
> - * helpers for cluster size to dentry size conversion.
> - */
> -#define EXFAT_CLU_TO_DEN(clu, sbi) \
> - ((clu) << ((sbi)->cluster_size_bits - DENTRY_SIZE_BITS))
> -#define EXFAT_DEN_TO_CLU(dentry, sbi) \
> - ((dentry) >> ((sbi)->cluster_size_bits - DENTRY_SIZE_BITS))
> -
> /*
> * helpers for fat entry.
> */
> @@ -149,7 +117,7 @@ enum {
> * The 608 bytes are in 3 sectors at most (even 512 Byte sector).
> */
> #define DIR_CACHE_SIZE \
> - (DIV_ROUND_UP(EXFAT_DEN_TO_B(ES_MAX_ENTRY_NUM), SECTOR_SIZE) + 1)
> + (DIV_ROUND_UP(ES_MAX_ENTRY_NUM << DENTRY_SIZE_BITS, SECTOR_SIZE) + 1)
>
> /* Superblock flags */
> #define EXFAT_FLAGS_SHUTDOWN 1
> @@ -432,6 +400,94 @@ static inline loff_t exfat_ondisk_size(const struct inode *inode)
> return ((loff_t)inode->i_blocks) << 9;
> }
>
> +/*
> + * helpers for cluster size to byte conversion.
> + */
> +static inline loff_t exfat_cluster_to_bytes(struct exfat_sb_info *sbi,
> + u32 nr_clusters)
> +{
> + return (loff_t)nr_clusters << sbi->cluster_size_bits;
> +}
> +
> +static inline blkcnt_t exfat_cluster_to_sectors(struct exfat_sb_info *sbi,
> + u32 nr_clusters)
> +{
> + return (blkcnt_t)nr_clusters << (sbi->cluster_size_bits - 9);
> +}
> +
> +static inline u32 exfat_bytes_to_cluster(struct exfat_sb_info *sbi, loff_t size)
> +{
> + return (u32)(size >> sbi->cluster_size_bits);
> +}
> +
> +static inline u32 exfat_bytes_to_cluster_round_up(struct exfat_sb_info *sbi,
> + loff_t size)
> +{
> + if (size <= 0)
> + return 0;
> + return (u32)((size - 1) >> sbi->cluster_size_bits) + 1;
> +}
> +
> +static inline u32 exfat_cluster_offset(struct exfat_sb_info *sbi, loff_t off)
> +{
> + return off & (sbi->cluster_size - 1);
> +}
> +
> +/*
> + * helpers for block size to byte conversion.
> + */
> +static inline loff_t exfat_block_to_bytes(struct super_block *sb,
> + sector_t block)
> +{
> + return (loff_t)block << sb->s_blocksize_bits;
> +}
> +
> +static inline sector_t exfat_bytes_to_block(struct super_block *sb, loff_t size)
> +{
> + return (sector_t)(size >> sb->s_blocksize_bits);
> +}
> +
> +static inline sector_t exfat_bytes_to_block_round_up(struct super_block *sb,
> + loff_t size)
> +{
> + if (size <= 0)
> + return 0;
> + return (sector_t)(((size - 1) >> sb->s_blocksize_bits) + 1);
> +}
> +
> +static inline u32 exfat_block_offset(struct super_block *sb, loff_t off)
> +{
> + return (u32)(off & (sb->s_blocksize - 1));
> +}
> +
> +/*
> + * helpers for block size to dentry size conversion.
> + */
> +static inline u32 exfat_bytes_to_dentries(loff_t b)
> +{
> + return (u32)(b >> DENTRY_SIZE_BITS);
> +}
> +
> +static inline u32 exfat_dentries_to_bytes(u32 dentry)
> +{
> + return dentry << DENTRY_SIZE_BITS;
> +}
> +
> +/*
> + * helpers for cluster size to dentry size conversion.
> + */
> +static inline u32 exfat_cluster_to_dentries(struct exfat_sb_info *sbi,
> + u32 nr_clusters)
> +{
> + return nr_clusters << (sbi->cluster_size_bits - DENTRY_SIZE_BITS);
> +}
> +
> +static inline u32 exfat_dentries_to_cluster(struct exfat_sb_info *sbi,
> + u32 dentry)
> +{
> + return dentry >> (sbi->cluster_size_bits - DENTRY_SIZE_BITS);
> +}
> +
> /* super.c */
> int exfat_set_volume_dirty(struct super_block *sb);
> int exfat_clear_volume_dirty(struct super_block *sb);
> diff --git a/fs/exfat/fatent.c b/fs/exfat/fatent.c
> index dce0955e689a..45b0b754a2e4 100644
> --- a/fs/exfat/fatent.c
> +++ b/fs/exfat/fatent.c
> @@ -412,8 +412,8 @@ int exfat_zeroed_cluster(struct inode *dir, unsigned int clu)
>
> if (IS_DIRSYNC(dir))
> return sync_blockdev_range(sb->s_bdev,
> - EXFAT_BLK_TO_B(blknr, sb),
> - EXFAT_BLK_TO_B(last_blknr, sb) - 1);
> + exfat_block_to_bytes(sb, blknr),
> + exfat_block_to_bytes(sb, last_blknr) - 1);
>
> return 0;
> }
> diff --git a/fs/exfat/file.c b/fs/exfat/file.c
> index 354bdcfe4abc..29a36a80e29b 100644
> --- a/fs/exfat/file.c
> +++ b/fs/exfat/file.c
> @@ -33,9 +33,9 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
> if (ret)
> return ret;
>
> - num_clusters = EXFAT_B_TO_CLU(exfat_ondisk_size(inode), sbi);
> + num_clusters = exfat_bytes_to_cluster(sbi, exfat_ondisk_size(inode));
> /* integer overflow is already checked in inode_newsize_ok(). */
> - new_num_clusters = EXFAT_B_TO_CLU_ROUND_UP(size, sbi);
> + new_num_clusters = exfat_bytes_to_cluster_round_up(sbi, size);
>
> if (new_num_clusters == num_clusters)
> goto out;
> @@ -200,8 +200,8 @@ int __exfat_truncate(struct inode *inode)
>
> exfat_set_volume_dirty(sb);
>
> - num_clusters_new = EXFAT_B_TO_CLU_ROUND_UP(i_size_read(inode), sbi);
> - num_clusters_phys = EXFAT_B_TO_CLU(exfat_ondisk_size(inode), sbi);
> + num_clusters_new = exfat_bytes_to_cluster_round_up(sbi, i_size_read(inode));
> + num_clusters_phys = exfat_bytes_to_cluster(sbi, exfat_ondisk_size(inode));
>
> exfat_chain_set(&clu, ei->start_clu, num_clusters_phys, ei->flags);
>
> diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
> index 1ea4c740fef9..d25e0a69865b 100644
> --- a/fs/exfat/inode.c
> +++ b/fs/exfat/inode.c
> @@ -135,7 +135,7 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
> unsigned int local_clu_offset = clu_offset;
> unsigned int num_to_be_allocated = 0, num_clusters;
>
> - num_clusters = EXFAT_B_TO_CLU(exfat_ondisk_size(inode), sbi);
> + num_clusters = exfat_bytes_to_cluster(sbi, exfat_ondisk_size(inode));
>
> if (clu_offset >= num_clusters)
> num_to_be_allocated = clu_offset - num_clusters + 1;
> @@ -216,7 +216,8 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
>
> *clu = new_clu.dir;
>
> - inode->i_blocks += EXFAT_CLU_TO_B(num_to_be_allocated, sbi) >> 9;
> + inode->i_blocks +=
> + exfat_cluster_to_sectors(sbi, num_to_be_allocated) >> 9;
>
> /*
> * Move *clu pointer along FAT chains (hole care) because the
> @@ -254,12 +255,12 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
>
> mutex_lock(&sbi->s_lock);
> i_size = i_size_read(inode);
> - last_block = EXFAT_B_TO_BLK_ROUND_UP(i_size, sb);
> + last_block = exfat_bytes_to_block_round_up(sb, i_size);
> if (iblock >= last_block && !create)
> goto done;
>
> /* Is this block already allocated? */
> - count = EXFAT_B_TO_CLU_ROUND_UP(bh_result->b_size, sbi);
> + count = exfat_bytes_to_cluster_round_up(sbi, bh_result->b_size);
> err = exfat_map_cluster(inode, iblock >> sbi->sect_per_clus_bits,
> &cluster, &count, create);
> if (err) {
> @@ -296,9 +297,9 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
> * care the last nested block if valid_size is not equal to i_size.
> */
> if (i_size == ei->valid_size || create || !bh_result->b_folio)
> - valid_blks = EXFAT_B_TO_BLK_ROUND_UP(ei->valid_size, sb);
> + valid_blks = exfat_bytes_to_block_round_up(sb, ei->valid_size);
> else
> - valid_blks = EXFAT_B_TO_BLK(ei->valid_size, sb);
> + valid_blks = exfat_bytes_to_block(sb, ei->valid_size);
>
> /* The range has been fully written, map it */
> if (iblock + max_blocks < valid_blks)
> @@ -313,7 +314,7 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
> /* The area has not been written, map and mark as new for create case */
> if (create) {
> set_buffer_new(bh_result);
> - ei->valid_size = EXFAT_BLK_TO_B(iblock + max_blocks, sb);
> + ei->valid_size = exfat_block_to_bytes(sb, iblock + max_blocks);
> mark_inode_dirty(inode);
> goto done;
> }
> @@ -343,7 +344,7 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
> goto done;
> }
>
> - pos = EXFAT_BLK_TO_B(iblock, sb);
> + pos = exfat_block_to_bytes(sb, iblock);
> size = ei->valid_size - pos;
> addr = folio_address(bh_result->b_folio) +
> offset_in_folio(bh_result->b_folio, pos);
> @@ -374,7 +375,7 @@ static int exfat_get_block(struct inode *inode, sector_t iblock,
> */
> clear_buffer_mapped(bh_result);
> done:
> - bh_result->b_size = EXFAT_BLK_TO_B(max_blocks, sb);
> + bh_result->b_size = exfat_block_to_bytes(sb, max_blocks);
> if (err < 0)
> clear_buffer_mapped(bh_result);
> unlock_ret:
> diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
> index 2c5636634b4a..752fbec9316b 100644
> --- a/fs/exfat/namei.c
> +++ b/fs/exfat/namei.c
> @@ -208,7 +208,7 @@ static int exfat_search_empty_slot(struct super_block *sb,
> int dentries_per_clu;
> struct exfat_chain clu;
> struct exfat_sb_info *sbi = EXFAT_SB(sb);
> - int total_entries = EXFAT_CLU_TO_DEN(p_dir->size, sbi);
> + unsigned int total_entries = exfat_cluster_to_dentries(sbi, p_dir->size);
>
> dentries_per_clu = sbi->dentries_per_clu;
>
> @@ -266,7 +266,7 @@ static int exfat_search_empty_slot(struct super_block *sb,
>
> static int exfat_check_max_dentries(struct inode *inode)
> {
> - if (EXFAT_B_TO_DEN(i_size_read(inode)) >= MAX_EXFAT_DENTRIES) {
> + if (exfat_bytes_to_dentries(i_size_read(inode)) >= MAX_EXFAT_DENTRIES) {
> /*
> * exFAT spec allows a dir to grow up to 8388608(256MB)
> * dentries
> @@ -314,7 +314,8 @@ int exfat_find_empty_entry(struct inode *inode,
> }
>
> exfat_chain_set(p_dir, ei->start_clu,
> - EXFAT_B_TO_CLU(i_size_read(inode), sbi), ei->flags);
> + exfat_bytes_to_cluster(sbi, i_size_read(inode)),
> + ei->flags);
>
> while ((dentry = exfat_search_empty_slot(sb, &hint_femp, p_dir,
> num_entries, es)) < 0) {
> @@ -375,7 +376,7 @@ int exfat_find_empty_entry(struct inode *inode,
>
> hint_femp.cur.size++;
> p_dir->size++;
> - size = EXFAT_CLU_TO_B(p_dir->size, sbi);
> + size = exfat_cluster_to_bytes(sbi, p_dir->size);
>
> /* directory inode should be updated in here */
> i_size_write(inode, size);
> @@ -604,7 +605,7 @@ static int exfat_find(struct inode *dir, const struct qstr *qname,
> return ret;
>
> exfat_chain_set(&cdir, ei->start_clu,
> - EXFAT_B_TO_CLU(i_size_read(dir), sbi), ei->flags);
> + exfat_bytes_to_cluster(sbi, i_size_read(dir)), ei->flags);
>
> /* check the validation of hint_stat and initialize it if required */
> if (ei->version != (inode_peek_iversion_raw(dir) & 0xffffffff)) {
> @@ -681,7 +682,7 @@ static int exfat_find(struct inode *dir, const struct qstr *qname,
> return -EIO;
> }
>
> - if (unlikely(EXFAT_B_TO_CLU_ROUND_UP(info->size, sbi) > sbi->used_clusters)) {
> + if (unlikely(exfat_bytes_to_cluster_round_up(sbi, info->size) > sbi->used_clusters)) {
> exfat_fs_error(sb, "data size is invalid(%lld)", info->size);
> return -EIO;
> }
> @@ -695,7 +696,8 @@ static int exfat_find(struct inode *dir, const struct qstr *qname,
>
> if (info->type == TYPE_DIR) {
> exfat_chain_set(&cdir, info->start_clu,
> - EXFAT_B_TO_CLU(info->size, sbi), info->flags);
> + exfat_bytes_to_cluster(sbi, info->size),
> + info->flags);
> count = exfat_count_dir_entries(sb, &cdir);
> if (count < 0)
> return -EIO;
> @@ -951,7 +953,7 @@ static int exfat_rmdir(struct inode *dir, struct dentry *dentry)
> }
>
> exfat_chain_set(&clu_to_free, ei->start_clu,
> - EXFAT_B_TO_CLU_ROUND_UP(i_size_read(inode), sbi), ei->flags);
> + exfat_bytes_to_cluster_round_up(sbi, i_size_read(inode)), ei->flags);
>
> err = exfat_check_dir_empty(sb, &clu_to_free);
> if (err) {
> @@ -1158,8 +1160,8 @@ static int __exfat_rename(struct inode *old_parent_inode,
>
> new_clu.dir = new_ei->start_clu;
> new_clu.size =
> - EXFAT_B_TO_CLU_ROUND_UP(i_size_read(new_inode),
> - sbi);
> + exfat_bytes_to_cluster_round_up(sbi,
> + i_size_read(new_inode));
> new_clu.flags = new_ei->flags;
>
> ret = exfat_check_dir_empty(sb, &new_clu);
> @@ -1203,8 +1205,8 @@ static int __exfat_rename(struct inode *old_parent_inode,
> struct exfat_chain new_clu_to_free;
>
> exfat_chain_set(&new_clu_to_free, new_ei->start_clu,
> - EXFAT_B_TO_CLU_ROUND_UP(i_size_read(new_inode),
> - sbi), new_ei->flags);
> + exfat_bytes_to_cluster_round_up(sbi, i_size_read(new_inode)),
> + new_ei->flags);
>
> if (exfat_free_cluster(new_inode, &new_clu_to_free)) {
> /* just set I/O error only */
> diff --git a/fs/exfat/super.c b/fs/exfat/super.c
> index 95d87e2d7717..cb2f8eefff99 100644
> --- a/fs/exfat/super.c
> +++ b/fs/exfat/super.c
> @@ -369,7 +369,7 @@ static int exfat_read_root(struct inode *inode, struct exfat_chain *root_clu)
> ei->hint_stat.clu = sbi->root_dir;
> ei->hint_femp.eidx = EXFAT_HINT_NONE;
>
> - i_size_write(inode, EXFAT_CLU_TO_B(root_clu->size, sbi));
> + i_size_write(inode, exfat_cluster_to_bytes(sbi, root_clu->size));
>
> num_subdirs = exfat_count_dir_entries(sb, root_clu);
> if (num_subdirs < 0)
> @@ -538,7 +538,7 @@ static int exfat_read_boot_sector(struct super_block *sb)
> * machines.
> */
> sb->s_maxbytes = min(MAX_LFS_FILESIZE,
> - EXFAT_CLU_TO_B((loff_t)EXFAT_MAX_NUM_CLUSTER, sbi));
> + exfat_cluster_to_bytes(sbi, (loff_t)EXFAT_MAX_NUM_CLUSTER));
>
> /* check logical sector size */
> if (exfat_calibrate_blocksize(sb, 1 << p_boot->sect_size_bits))
> --
> 2.25.1
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 3/9] exfat: add exfat_file_open()
2026-05-07 12:42 ` [PATCH v2 3/9] exfat: add exfat_file_open() Namjae Jeon
@ 2026-05-07 13:52 ` CharSyam
2026-05-07 23:37 ` Namjae Jeon
0 siblings, 1 reply; 23+ messages in thread
From: CharSyam @ 2026-05-07 13:52 UTC (permalink / raw)
To: Namjae Jeon
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, hch, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
Hi, namjae.
How about this? I think err is not needed.
static int exfat_file_open(struct inode *inode, struct file *filp)
{
if (unlikely(exfat_forced_shutdown(inode->i_sb)))
return -EIO;
return generic_file_open(inode, filp);
}
2026년 5월 7일 (목) 오후 9:45, Namjae Jeon <linkinjeon@kernel.org>님이 작성:
>
> Add exfat_file_open() to handle file open operation for exFAT.
> This change is a preparation step before introducing iomap-based direct
> IO support.
>
> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
> ---
> fs/exfat/file.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/fs/exfat/file.c b/fs/exfat/file.c
> index 29a36a80e29b..b7a4964631a0 100644
> --- a/fs/exfat/file.c
> +++ b/fs/exfat/file.c
> @@ -800,7 +800,22 @@ static ssize_t exfat_splice_read(struct file *in, loff_t *ppos,
> return filemap_splice_read(in, ppos, pipe, len, flags);
> }
>
> +static int exfat_file_open(struct inode *inode, struct file *filp)
> +{
> + int err;
> +
> + if (unlikely(exfat_forced_shutdown(inode->i_sb)))
> + return -EIO;
> +
> + err = generic_file_open(inode, filp);
> + if (err)
> + return err;
> +
> + return 0;
> +}
> +
> const struct file_operations exfat_file_operations = {
> + .open = exfat_file_open,
> .llseek = generic_file_llseek,
> .read_iter = exfat_file_read_iter,
> .write_iter = exfat_file_write_iter,
> --
> 2.25.1
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 4/9] exfat: add support for multi-cluster allocation
2026-05-07 12:42 ` [PATCH v2 4/9] exfat: add support for multi-cluster allocation Namjae Jeon
@ 2026-05-07 14:09 ` CharSyam
2026-05-08 0:27 ` Namjae Jeon
2026-05-10 13:32 ` Chi Zhiling
1 sibling, 1 reply; 23+ messages in thread
From: CharSyam @ 2026-05-07 14:09 UTC (permalink / raw)
To: Namjae Jeon
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, hch, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
hi, Namjae.
The critical issue is that this hunk drops the hole-care walk in
exfat_map_cluster():
*clu = new_clu.dir;
+ *count = new_clu.size;
- inode->i_blocks +=
- exfat_cluster_to_sectors(sbi,
num_to_be_allocated) >> 9;
-
- /*
- * Move *clu pointer along FAT chains (hole care) because the
- * caller of this function expect *clu to be the last cluster.
- * This only works when num_to_be_allocated >= 2,
- * *clu = (the first cluster of the allocated chain) =>
- * (the last cluster of ...)
- */
- if (exfat_cluster_walk(sb, clu, num_to_be_allocated
- 1, ei->flags))
- return -EIO;
- *count = 1;
+ inode->i_blocks += exfat_cluster_to_sectors(sbi,
new_clu.size);
The removed comment is explicit: the caller expects *clu to be the
cluster at the requested clu_offset, not necessarily the first cluster
of the newly allocated chain. The old code walked
num_to_be_allocated - 1 steps so that *clu ended up at clu_offset.
With this change, *clu is left as new_clu.dir (the first newly
allocated cluster) and *count is set to new_clu.size. That is only
correct for the sequential-extension case where
clu_offset == num_clusters. For clu_offset > num_clusters, the caller
will compute the physical sector from the wrong cluster:
phys = exfat_cluster_to_sector(sbi, cluster) + sec_offset;
This also interacts badly with the new contig=true allocation mode.
If exfat_alloc_cluster() stops early at a non-contiguous boundary,
new_clu.size can be smaller than num_to_be_allocated, so the allocated
run may not even reach clu_offset while exfat_map_cluster() still
returns success.
Could you keep the old hole-care behavior and extend it to the
multi-cluster case? After allocation, the code should advance from
new_clu.dir to the cluster corresponding to clu_offset, set *count to
the remaining number of clusters from that point, and handle partial
contiguous allocation by either continuing allocation or unwinding it
before returning an error.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 1/9] exfat: replace unsafe macros with static inline functions
2026-05-07 13:41 ` CharSyam
@ 2026-05-07 23:36 ` Namjae Jeon
0 siblings, 0 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 23:36 UTC (permalink / raw)
To: CharSyam
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, hch, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
On Thu, May 7, 2026 at 10:41 PM CharSyam <charsyam@gmail.com> wrote:
>
> Hi. Namjae.
Hi Daemyung,
>
> in fs/exfat/inode.c
>
> - inode->i_blocks += EXFAT_CLU_TO_B(num_to_be_allocated,
> sbi) >> 9;
> + inode->i_blocks +=
> + exfat_cluster_to_sectors(sbi, num_to_be_allocated) >> 9;
>
> exfat_cluster_to_sectors() already returns a 512-byte block count, so
> the extra >> 9 makes i_blocks 512 times too small.
Right, I will fix it in the next version.
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 3/9] exfat: add exfat_file_open()
2026-05-07 13:52 ` CharSyam
@ 2026-05-07 23:37 ` Namjae Jeon
0 siblings, 0 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-07 23:37 UTC (permalink / raw)
To: CharSyam
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, hch, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
On Thu, May 7, 2026 at 10:52 PM CharSyam <charsyam@gmail.com> wrote:
>
> Hi, namjae.
>
> How about this? I think err is not needed.
>
> static int exfat_file_open(struct inode *inode, struct file *filp)
> {
> if (unlikely(exfat_forced_shutdown(inode->i_sb)))
> return -EIO;
>
> return generic_file_open(inode, filp);
> }
Okay, I will fix it in the next version.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 4/9] exfat: add support for multi-cluster allocation
2026-05-07 14:09 ` CharSyam
@ 2026-05-08 0:27 ` Namjae Jeon
0 siblings, 0 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-08 0:27 UTC (permalink / raw)
To: CharSyam
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, hch, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
On Thu, May 7, 2026 at 11:09 PM CharSyam <charsyam@gmail.com> wrote:
>
> hi, Namjae.
>
> The critical issue is that this hunk drops the hole-care walk in
> exfat_map_cluster():
>
> *clu = new_clu.dir;
> + *count = new_clu.size;
>
> - inode->i_blocks +=
> - exfat_cluster_to_sectors(sbi,
> num_to_be_allocated) >> 9;
> -
> - /*
> - * Move *clu pointer along FAT chains (hole care) because the
> - * caller of this function expect *clu to be the last cluster.
> - * This only works when num_to_be_allocated >= 2,
> - * *clu = (the first cluster of the allocated chain) =>
> - * (the last cluster of ...)
> - */
> - if (exfat_cluster_walk(sb, clu, num_to_be_allocated
> - 1, ei->flags))
> - return -EIO;
> - *count = 1;
> + inode->i_blocks += exfat_cluster_to_sectors(sbi,
> new_clu.size);
>
> The removed comment is explicit: the caller expects *clu to be the
> cluster at the requested clu_offset, not necessarily the first cluster
> of the newly allocated chain. The old code walked
> num_to_be_allocated - 1 steps so that *clu ended up at clu_offset.
>
> With this change, *clu is left as new_clu.dir (the first newly
> allocated cluster) and *count is set to new_clu.size. That is only
> correct for the sequential-extension case where
> clu_offset == num_clusters. For clu_offset > num_clusters, the caller
> will compute the physical sector from the wrong cluster:
>
> phys = exfat_cluster_to_sector(sbi, cluster) + sec_offset;
>
> This also interacts badly with the new contig=true allocation mode.
> If exfat_alloc_cluster() stops early at a non-contiguous boundary,
> new_clu.size can be smaller than num_to_be_allocated, so the allocated
> run may not even reach clu_offset while exfat_map_cluster() still
> returns success.
>
> Could you keep the old hole-care behavior and extend it to the
> multi-cluster case? After allocation, the code should advance from
> new_clu.dir to the cluster corresponding to clu_offset, set *count to
> the remaining number of clusters from that point, and handle partial
> contiguous allocation by either continuing allocation or unwinding it
> before returning an error.
exfat_map_cluster() returns a contiguous cluster range to the caller :
a starting cluster number (*clu) and the number of contiguous
clusters(*count) from that point. num_to_be_allocated is only the
maximum number of clusters the caller would like to allocate. If
exfat_alloc_cluster() returns fewer clusters than requested, that is
not an error.
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 5/9] iomap: introduce IOMAP_F_ZERO_TAIL flag
2026-05-07 12:42 ` [PATCH v2 5/9] iomap: introduce IOMAP_F_ZERO_TAIL flag Namjae Jeon
@ 2026-05-09 9:59 ` Chi Zhiling
2026-05-09 14:30 ` Namjae Jeon
2026-05-11 12:45 ` Christoph Hellwig
1 sibling, 1 reply; 23+ messages in thread
From: Chi Zhiling @ 2026-05-09 9:59 UTC (permalink / raw)
To: Namjae Jeon, sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel
On 5/7/26 20:42, Namjae Jeon wrote:
> In filesystems that maintain a separate Valid Data Length, such as exFAT
> and NTFS, a partial write may start at or beyond the current valid_size and
> extend it. In this case, the region after the previous valid_size but
> within the same filesystem block is considered unwritten.
>
> This patch introduces IOMAP_F_ZERO_TAIL. When this flag is set in iomap,
> __iomap_write_begin() will zero only the tail portion while preserving any
> valid data before it in the same block.
>
> Without this tail zeroing, stale data in the unwritten portion of the block
> can remain in the page cache. Subsequent reads can then return incorrect
> contents from that region.
>
> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
> ---
> fs/iomap/buffered-io.c | 4 ++++
> include/linux/iomap.h | 4 ++++
> 2 files changed, 8 insertions(+)
Hi, Namjae
While testing this patch series, I found that generic/299 can cause my
virtual machine to become unresponsive for a long time. Calling
balance_dirty_pages_ratelimited in the loop of iomap_zero_iter can
resolve this issue, so perhaps we need to add this change.
I am continuing to test this patch series, but I'm not sure I will be
able to find all possible issues :)
thanks,
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d7b648421a70..9ae136a9fcd4 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -837,6 +837,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
> folio_zero_segments(folio, poff, from, to, poff + plen);
> } else {
> int status;
> + const struct iomap *iomap = iomap_iter_srcmap(iter);
>
> if (iter->flags & IOMAP_NOWAIT)
> return -EAGAIN;
> @@ -853,6 +854,9 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
> len, status, GFP_NOFS);
> if (status)
> return status;
> +
> + if (iomap->flags & IOMAP_F_ZERO_TAIL)
> + folio_zero_segment(folio, to, poff + plen);
> }
> iomap_set_range_uptodate(folio, poff, plen);
> } while ((block_start += plen) < block_end);
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 2c5685adf3a9..e3e7ad72b29e 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -67,6 +67,9 @@ struct vm_fault;
> * bio, i.e. set REQ_ATOMIC.
> *
> * IOMAP_F_INTEGRITY indicates that the filesystems handles integrity metadata.
> + *
> + * IOMAP_F_ZERO_TAIL indicates only the unwritten tail of the block should be
> + * zeroed.
> */
> #define IOMAP_F_NEW (1U << 0)
> #define IOMAP_F_DIRTY (1U << 1)
> @@ -86,6 +89,7 @@ struct vm_fault;
> #else
> #define IOMAP_F_INTEGRITY 0
> #endif /* CONFIG_BLK_DEV_INTEGRITY */
> +#define IOMAP_F_ZERO_TAIL (1U << 10)
>
> /*
> * Flag reserved for file system specific usage
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 5/9] iomap: introduce IOMAP_F_ZERO_TAIL flag
2026-05-09 9:59 ` Chi Zhiling
@ 2026-05-09 14:30 ` Namjae Jeon
0 siblings, 0 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-09 14:30 UTC (permalink / raw)
To: Chi Zhiling
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, hch, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
>
> Hi, Namjae
Hi Chi,
>
> While testing this patch series, I found that generic/299 can cause my
> virtual machine to become unresponsive for a long time. Calling
> balance_dirty_pages_ratelimited in the loop of iomap_zero_iter can
> resolve this issue, so perhaps we need to add this change.
I agree that this should be fixed with a separate patch as you
suggested. I would be very grateful if you could send a patch for this
fix. I will also take a look.
>
> I am continuing to test this patch series, but I'm not sure I will be
> able to find all possible issues :)
Let me know if you find more issues.
Thanks!
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 4/9] exfat: add support for multi-cluster allocation
2026-05-07 12:42 ` [PATCH v2 4/9] exfat: add support for multi-cluster allocation Namjae Jeon
2026-05-07 14:09 ` CharSyam
@ 2026-05-10 13:32 ` Chi Zhiling
2026-05-11 0:20 ` Namjae Jeon
1 sibling, 1 reply; 23+ messages in thread
From: Chi Zhiling @ 2026-05-10 13:32 UTC (permalink / raw)
To: Namjae Jeon, sj1557.seo, yuezhang.mo, brauner, djwong, hch
Cc: linux-fsdevel, anmuxixixi, dxdt, chizhiling, linux-kernel
On 5/7/26 8:42 PM, Namjae Jeon wrote:
> Currently exfat_map_cluster() allocates and returns only one cluster
> at a time even when more clusters are needed. This causes multiple
> FAT walks and repeated allocation calls during large sequential writes
> or when using iomap for writes. This change exfat_map_cluster() and
> exfat_alloc_cluster() to be able to allocate multiple contiguous
> clusters.
>
> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
> ---
> fs/exfat/dir.c | 2 +-
> fs/exfat/exfat_fs.h | 2 +-
> fs/exfat/fatent.c | 26 ++++++++++++++++----------
> fs/exfat/file.c | 2 +-
> fs/exfat/inode.c | 23 ++++++-----------------
> fs/exfat/namei.c | 2 +-
> 6 files changed, 26 insertions(+), 31 deletions(-)
>
> diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
> index ca9d707220df..37f324399b6b 100644
> --- a/fs/exfat/dir.c
> +++ b/fs/exfat/dir.c
> @@ -295,7 +295,7 @@ int exfat_alloc_new_dir(struct inode *inode, struct exfat_chain *clu)
>
> exfat_chain_set(clu, EXFAT_EOF_CLUSTER, 0, ALLOC_NO_FAT_CHAIN);
>
> - ret = exfat_alloc_cluster(inode, 1, clu, IS_DIRSYNC(inode));
> + ret = exfat_alloc_cluster(inode, 1, clu, IS_DIRSYNC(inode), false);
> if (ret)
> return ret;
>
> diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
> index 9c8ab3df7a42..6e7fd6822b01 100644
> --- a/fs/exfat/exfat_fs.h
> +++ b/fs/exfat/exfat_fs.h
> @@ -497,7 +497,7 @@ int exfat_clear_volume_dirty(struct super_block *sb);
> exfat_cluster_walk(sb, (pclu), 1, ALLOC_FAT_CHAIN)
>
> int exfat_alloc_cluster(struct inode *inode, unsigned int num_alloc,
> - struct exfat_chain *p_chain, bool sync_bmap);
> + struct exfat_chain *p_chain, bool sync_bmap, bool contig);
> int exfat_free_cluster(struct inode *inode, struct exfat_chain *p_chain);
> int exfat_ent_get(struct super_block *sb, unsigned int loc,
> unsigned int *content, struct buffer_head **last);
> diff --git a/fs/exfat/fatent.c b/fs/exfat/fatent.c
> index 45b0b754a2e4..a917c954bd23 100644
> --- a/fs/exfat/fatent.c
> +++ b/fs/exfat/fatent.c
> @@ -419,7 +419,7 @@ int exfat_zeroed_cluster(struct inode *dir, unsigned int clu)
> }
>
> int exfat_alloc_cluster(struct inode *inode, unsigned int num_alloc,
> - struct exfat_chain *p_chain, bool sync_bmap)
> + struct exfat_chain *p_chain, bool sync_bmap, bool contig)
> {
> int ret = -ENOSPC;
> unsigned int total_cnt;
> @@ -470,14 +470,20 @@ int exfat_alloc_cluster(struct inode *inode, unsigned int num_alloc,
>
> while ((new_clu = exfat_find_free_bitmap(sb, hint_clu)) !=
> EXFAT_EOF_CLUSTER) {
> - if (new_clu != hint_clu &&
> - p_chain->flags == ALLOC_NO_FAT_CHAIN) {
> - if (exfat_chain_cont_cluster(sb, p_chain->dir,
> - p_chain->size)) {
> - ret = -EIO;
> - goto free_cluster;
> + if (new_clu != hint_clu) {
> + if (p_chain->flags == ALLOC_NO_FAT_CHAIN) {
> + if (exfat_chain_cont_cluster(sb, p_chain->dir,
> + p_chain->size)) {
> + ret = -EIO;
> + goto free_cluster;
> + }
> + p_chain->flags = ALLOC_FAT_CHAIN;
> + }
> +
> + if (contig && p_chain->size > 0) {
> + hint_clu--;
> + goto done;
> }
> - p_chain->flags = ALLOC_FAT_CHAIN;
> }
>
> /* update allocation bitmap */
> @@ -507,9 +513,9 @@ int exfat_alloc_cluster(struct inode *inode, unsigned int num_alloc,
> last_clu = new_clu;
>
> if (p_chain->size == num_alloc) {
> +done:
> sbi->clu_srch_ptr = hint_clu;
> - sbi->used_clusters += num_alloc;
> -
> + sbi->used_clusters += p_chain->size;
> mutex_unlock(&sbi->bitmap_lock);
> return 0;
> }
> diff --git a/fs/exfat/file.c b/fs/exfat/file.c
> index b7a4964631a0..15b9d6a1766a 100644
> --- a/fs/exfat/file.c
> +++ b/fs/exfat/file.c
> @@ -56,7 +56,7 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
> clu.flags = ei->flags;
>
> ret = exfat_alloc_cluster(inode, new_num_clusters - num_clusters,
> - &clu, inode_needs_sync(inode));
> + &clu, inode_needs_sync(inode), false);
> if (ret)
> return ret;
>
> diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
> index 1246713567a5..7b09d94ac464 100644
> --- a/fs/exfat/inode.c
> +++ b/fs/exfat/inode.c
> @@ -137,9 +137,9 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
> unsigned int num_to_be_allocated = 0, num_clusters;
>
> num_clusters = exfat_bytes_to_cluster(sbi, exfat_ondisk_size(inode));
> -
> - if (clu_offset >= num_clusters)
> - num_to_be_allocated = clu_offset - num_clusters + 1;
> + if (clu_offset > num_clusters ||
> + *count > num_clusters - clu_offset)
> + num_to_be_allocated = clu_offset + *count - num_clusters;
We need to truncate *count for read operations so that it does not
extend beyond EOF. Otherwise, xfstests generic/003 will fail.
thanks,
>
> if (!create && (num_to_be_allocated > 0)) {
> *clu = EXFAT_EOF_CLUSTER;
> @@ -182,7 +182,7 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
> }
>
> ret = exfat_alloc_cluster(inode, num_to_be_allocated, &new_clu,
> - inode_needs_sync(inode));
> + inode_needs_sync(inode), true);
> if (ret)
> return ret;
>
> @@ -216,20 +216,9 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
> }
>
> *clu = new_clu.dir;
> + *count = new_clu.size;
>
> - inode->i_blocks +=
> - exfat_cluster_to_sectors(sbi, num_to_be_allocated) >> 9;
> -
> - /*
> - * Move *clu pointer along FAT chains (hole care) because the
> - * caller of this function expect *clu to be the last cluster.
> - * This only works when num_to_be_allocated >= 2,
> - * *clu = (the first cluster of the allocated chain) =>
> - * (the last cluster of ...)
> - */
> - if (exfat_cluster_walk(sb, clu, num_to_be_allocated - 1, ei->flags))
> - return -EIO;
> - *count = 1;
> + inode->i_blocks += exfat_cluster_to_sectors(sbi, new_clu.size);
> if (balloc)
> *balloc = true;
> }
> diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c
> index 752fbec9316b..87e55c5c1bf4 100644
> --- a/fs/exfat/namei.c
> +++ b/fs/exfat/namei.c
> @@ -341,7 +341,7 @@ int exfat_find_empty_entry(struct inode *inode,
> }
>
> /* allocate a cluster */
> - ret = exfat_alloc_cluster(inode, 1, &clu, IS_DIRSYNC(inode));
> + ret = exfat_alloc_cluster(inode, 1, &clu, IS_DIRSYNC(inode), false);
> if (ret)
> return ret;
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 4/9] exfat: add support for multi-cluster allocation
2026-05-10 13:32 ` Chi Zhiling
@ 2026-05-11 0:20 ` Namjae Jeon
2026-05-11 0:45 ` Chi Zhiling
0 siblings, 1 reply; 23+ messages in thread
From: Namjae Jeon @ 2026-05-11 0:20 UTC (permalink / raw)
To: Chi Zhiling
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, hch, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
> > diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
> > index 1246713567a5..7b09d94ac464 100644
> > --- a/fs/exfat/inode.c
> > +++ b/fs/exfat/inode.c
> > @@ -137,9 +137,9 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
> > unsigned int num_to_be_allocated = 0, num_clusters;
> >
> > num_clusters = exfat_bytes_to_cluster(sbi, exfat_ondisk_size(inode));
> > -
> > - if (clu_offset >= num_clusters)
> > - num_to_be_allocated = clu_offset - num_clusters + 1;
> > + if (clu_offset > num_clusters ||
> > + *count > num_clusters - clu_offset)
> > + num_to_be_allocated = clu_offset + *count - num_clusters;
>
> We need to truncate *count for read operations so that it does not
> extend beyond EOF. Otherwise, xfstests generic/003 will fail.
I'm not seeing this failure... Can you elaborate more why it was
failing on your side ?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 4/9] exfat: add support for multi-cluster allocation
2026-05-11 0:20 ` Namjae Jeon
@ 2026-05-11 0:45 ` Chi Zhiling
0 siblings, 0 replies; 23+ messages in thread
From: Chi Zhiling @ 2026-05-11 0:45 UTC (permalink / raw)
To: Namjae Jeon
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, hch, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
On 5/11/26 08:20, Namjae Jeon wrote:
>>> diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
>>> index 1246713567a5..7b09d94ac464 100644
>>> --- a/fs/exfat/inode.c
>>> +++ b/fs/exfat/inode.c
>>> @@ -137,9 +137,9 @@ static int exfat_map_cluster(struct inode *inode, unsigned int clu_offset,
>>> unsigned int num_to_be_allocated = 0, num_clusters;
>>>
>>> num_clusters = exfat_bytes_to_cluster(sbi, exfat_ondisk_size(inode));
>>> -
>>> - if (clu_offset >= num_clusters)
>>> - num_to_be_allocated = clu_offset - num_clusters + 1;
>>> + if (clu_offset > num_clusters ||
>>> + *count > num_clusters - clu_offset)
>>> + num_to_be_allocated = clu_offset + *count - num_clusters;
>>
>> We need to truncate *count for read operations so that it does not
>> extend beyond EOF. Otherwise, xfstests generic/003 will fail.
> I'm not seeing this failure... Can you elaborate more why it was
> failing on your side ?
In my vm with 6G ram and 40G test disk, when the cluster size is 512
bytes, reading a 512-byte file (which consists of one cluster) fails
because iomap attempts to fetch a map for the 0–4K range, which exceeds
the file's EOF.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 5/9] iomap: introduce IOMAP_F_ZERO_TAIL flag
2026-05-07 12:42 ` [PATCH v2 5/9] iomap: introduce IOMAP_F_ZERO_TAIL flag Namjae Jeon
2026-05-09 9:59 ` Chi Zhiling
@ 2026-05-11 12:45 ` Christoph Hellwig
2026-05-11 13:46 ` Namjae Jeon
1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2026-05-11 12:45 UTC (permalink / raw)
To: Namjae Jeon
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, hch, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
Note: out of subsystem patches should usually go first to stand out.
> @@ -853,6 +854,9 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
> len, status, GFP_NOFS);
> if (status)
> return status;
> +
> + if (iomap->flags & IOMAP_F_ZERO_TAIL)
> + folio_zero_segment(folio, to, poff + plen);
> + * IOMAP_F_ZERO_TAIL indicates only the unwritten tail of the block should be
> + * zeroed.
So trying to understand what you are doing here.
We're writing to a block with the magic all invalid beyond this marker,
and you want the data beyond the file write zeroed. This seems really
nice and simply, but talking about 'unwritten' confused as I assumed it
to be an unwritten extent. But I guess this is not actually reported
as an unwritten extent, right? Maybe drop the unwritten here or
reword it as
* IOMAP_F_ZERO_TAIL indicates the remainder of the block after the
* data written should be zeroed.
?
And for all this to trigger that write needs to be at valid_size
but before i_size so that iomap_block_needs_zeroing does not kick in,
right?
And this does n
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 5/9] iomap: introduce IOMAP_F_ZERO_TAIL flag
2026-05-11 12:45 ` Christoph Hellwig
@ 2026-05-11 13:46 ` Namjae Jeon
0 siblings, 0 replies; 23+ messages in thread
From: Namjae Jeon @ 2026-05-11 13:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: sj1557.seo, yuezhang.mo, brauner, djwong, linux-fsdevel,
anmuxixixi, dxdt, chizhiling, linux-kernel
On Mon, May 11, 2026 at 9:45 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Note: out of subsystem patches should usually go first to stand out.
Okay.
>
> > @@ -853,6 +854,9 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
> > len, status, GFP_NOFS);
> > if (status)
> > return status;
> > +
> > + if (iomap->flags & IOMAP_F_ZERO_TAIL)
> > + folio_zero_segment(folio, to, poff + plen);
>
> > + * IOMAP_F_ZERO_TAIL indicates only the unwritten tail of the block should be
> > + * zeroed.
>
> So trying to understand what you are doing here.
>
> We're writing to a block with the magic all invalid beyond this marker,
> and you want the data beyond the file write zeroed. This seems really
> nice and simply, but talking about 'unwritten' confused as I assumed it
> to be an unwritten extent. But I guess this is not actually reported
> as an unwritten extent, right? Maybe drop the unwritten here or
> reword it as
Right. I agreed.
>
> * IOMAP_F_ZERO_TAIL indicates the remainder of the block after the
> * data written should be zeroed.
>
> ?
I will update it like this in the next version.
>
> And for all this to trigger that write needs to be at valid_size
> but before i_size so that iomap_block_needs_zeroing does not kick in,
> right?
Right.
Thanks for the review!
> And this does n
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2026-05-11 13:47 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 12:42 [PATCH v2 0/9] exfat: convert to iomap Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 1/9] exfat: replace unsafe macros with static inline functions Namjae Jeon
2026-05-07 13:41 ` CharSyam
2026-05-07 23:36 ` Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 2/9] exfat: add balloc parameter to exfat_map_cluster() for iomap support Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 3/9] exfat: add exfat_file_open() Namjae Jeon
2026-05-07 13:52 ` CharSyam
2026-05-07 23:37 ` Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 4/9] exfat: add support for multi-cluster allocation Namjae Jeon
2026-05-07 14:09 ` CharSyam
2026-05-08 0:27 ` Namjae Jeon
2026-05-10 13:32 ` Chi Zhiling
2026-05-11 0:20 ` Namjae Jeon
2026-05-11 0:45 ` Chi Zhiling
2026-05-07 12:42 ` [PATCH v2 5/9] iomap: introduce IOMAP_F_ZERO_TAIL flag Namjae Jeon
2026-05-09 9:59 ` Chi Zhiling
2026-05-09 14:30 ` Namjae Jeon
2026-05-11 12:45 ` Christoph Hellwig
2026-05-11 13:46 ` Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 6/9] exfat: add data_start_bytes and exfat_cluster_to_phys() helper Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 7/9] exfat: add iomap buffered I/O support Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 8/9] exfat: add iomap direct " Namjae Jeon
2026-05-07 12:42 ` [PATCH v2 9/9] exfat: add support for SEEK_HOLE and SEEK_DATA in llseek Namjae Jeon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox