Linux EXT4 FS development

Linux EXT4 FS development
 help / color / mirror / Atom feed

* Re: [PATCH 09/12] swap: push down setting sis->bdev into ->swap_activate
From: Christoph Hellwig @ 2026-05-13  5:58 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song,
	Christian Brauner, Jens Axboe, David Sterba, Theodore Ts'o,
	Jaegeuk Kim, Chao Yu, Trond Myklebust, Anna Schumaker,
	Namjae Jeon, Hyunchul Lee, Steve French, Paulo Alcantara,
	Carlos Maiolino, Damien Le Moal, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512170846.GJ9555@frogsfrogsfrogs>

On Tue, May 12, 2026 at 10:08:46AM -0700, Darrick J. Wong wrote:
> > +	/* Only one bdev per swap file for now. */
> > +	if (!sis->bdev)
> > +		sis->bdev = bdev;
> > +	else if (bdev != sis->bdev)
> > +		return -EINVAL;
> 
> Should this return error if the bdev is zoned?  AFAICT XFS and zonefs
> already guard against this, but other fses might be more naïve.

Yes, now that the bdev is passed down to add_swap_extent we could
consolidate the check here.


^ permalink raw reply

* Re: [PATCH 08/12] swap,iomap: simplify iomap_swapfile_iter
From: Christoph Hellwig @ 2026-05-13  6:56 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song,
	Christian Brauner, Jens Axboe, David Sterba, Theodore Ts'o,
	Jaegeuk Kim, Chao Yu, Trond Myklebust, Anna Schumaker,
	Namjae Jeon, Hyunchul Lee, Steve French, Paulo Alcantara,
	Carlos Maiolino, Damien Le Moal, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512170204.GI9555@frogsfrogsfrogs>

On Tue, May 12, 2026 at 10:02:04AM -0700, Darrick J. Wong wrote:
> OH.  Now I remember why -- it's to handle contiguous mixed mappings
> better.
> 
> Let's say that you have a 1k fsblock filesystem and 4k base pages.  You
> fallocate an 8G swap file and then mkswap it.  The first mapping is a 1k
> written mapping at offset 0 for the swap header, followed by an 8388607k
> unwritten mapping at offset 3k.
> 
> The PAGE_SIZE rounding code in iomap_swapfile_add_extent will round the
> end of that first mapping down to zero and ignore it.  The second
> mapping will be treated as if it were a 8388604k mapping starting at
> offset 4096.  Now the page counts are wrong and the swapon fails.

Do we care about this use case?  I guess you did as you implemented
his, but still?

> 
> A more generic solution to this would be to change add_swap_extent to
> take sector_t addr and length values and use them to construct a bitmap
> representing contiguous physical space on the bdev, accounting of course
> for PAGE_SIZE alignment.  Except for the swap header page, every other
> contiguously set page-aligned region in the bitmap gets added to the
> swap extent map.

You don't even need a bitmap, just do basically the same checks as
the iomap code when moving to a new swap extent after moving to use
the sector_t.  And it really should anyway, as the current abuse of
sector_t to store a disk offset in PAGE_SIZE units is pretty gross.


^ permalink raw reply

* Re: [PATCH 09/12] swap: push down setting sis->bdev into ->swap_activate
From: Damien Le Moal @ 2026-05-13  7:44 UTC (permalink / raw)
  To: Christoph Hellwig, Darrick J. Wong
  Cc: Andrew Morton, Chris Li, Kairui Song, Christian Brauner,
	Jens Axboe, David Sterba, Theodore Ts'o, Jaegeuk Kim, Chao Yu,
	Trond Myklebust, Anna Schumaker, Namjae Jeon, Hyunchul Lee,
	Steve French, Paulo Alcantara, Carlos Maiolino, Naohiro Aota,
	linux-xfs, linux-fsdevel, linux-doc, linux-mm, linux-block,
	linux-btrfs, linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260513055806.GC1236@lst.de>

On 5/13/26 14:58, Christoph Hellwig wrote:
> On Tue, May 12, 2026 at 10:08:46AM -0700, Darrick J. Wong wrote:
>>> +	/* Only one bdev per swap file for now. */
>>> +	if (!sis->bdev)
>>> +		sis->bdev = bdev;
>>> +	else if (bdev != sis->bdev)
>>> +		return -EINVAL;
>>
>> Should this return error if the bdev is zoned?  AFAICT XFS and zonefs
>> already guard against this, but other fses might be more naïve.
> 
> Yes, now that the bdev is passed down to add_swap_extent we could
> consolidate the check here.

Hmmm... With zonefs, swap files can be created on top of conventional zone
files. So enforcing "no swap on zoned device" here would break that.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* Re: [PATCH 09/12] swap: push down setting sis->bdev into ->swap_activate
From: Christoph Hellwig @ 2026-05-13  7:46 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Darrick J. Wong, Andrew Morton, Chris Li,
	Kairui Song, Christian Brauner, Jens Axboe, David Sterba,
	Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
	Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
	Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <acd6428b-a352-4f7b-a349-b2c9e341fd87@kernel.org>

On Wed, May 13, 2026 at 04:44:53PM +0900, Damien Le Moal wrote:
> Hmmm... With zonefs, swap files can be created on top of conventional zone
> files. So enforcing "no swap on zoned device" here would break that.

We can check that none of the extents fall onto sequential zones instead
of just devices.

I still wonder why you bother with swap to zonefs at all, though.


^ permalink raw reply

* Re: [PATCH 09/12] swap: push down setting sis->bdev into ->swap_activate
From: Damien Le Moal @ 2026-05-13  7:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Andrew Morton, Chris Li, Kairui Song,
	Christian Brauner, Jens Axboe, David Sterba, Theodore Ts'o,
	Jaegeuk Kim, Chao Yu, Trond Myklebust, Anna Schumaker,
	Namjae Jeon, Hyunchul Lee, Steve French, Paulo Alcantara,
	Carlos Maiolino, Naohiro Aota, linux-xfs, linux-fsdevel,
	linux-doc, linux-mm, linux-block, linux-btrfs, linux-ext4,
	linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260513074608.GA3693@lst.de>

On 5/13/26 16:46, Christoph Hellwig wrote:
> On Wed, May 13, 2026 at 04:44:53PM +0900, Damien Le Moal wrote:
>> Hmmm... With zonefs, swap files can be created on top of conventional zone
>> files. So enforcing "no swap on zoned device" here would break that.
> 
> We can check that none of the extents fall onto sequential zones instead
> of just devices.
> 
> I still wonder why you bother with swap to zonefs at all, though.

Yeah. I do not think anyone actually use that... But since it is there from the
start, kind of stuck with it now.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply

* [PATCH] ext4: fix fast commit wait/wake bit mapping on 64-bit
From: Li Chen @ 2026-05-13  8:58 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Andreas Dilger, Baokun Li, Jan Kara, Ojaswin Mujoo,
	Ritesh Harjani, Zhang Yi, linux-ext4, linux-kernel,
	Sashiko AI review

On 64-bit, ext4 dynamic inode states live in the upper half of i_flags,
and ext4_test_inode_state() applies the corresponding +32 offset.

The fast-commit wait and wake paths open-coded the wait key with the raw
EXT4_STATE_* value. Add small helpers for the state wait word and bit,
and use them for the FC_COMMITTING and FC_FLUSHING_DATA waits so the wait
key follows the same mapping as the state helpers.

Fixes: 857d32f26181 ("ext4: rework fast commit commit path")
Reported-by: Sashiko AI review <sashiko-bot@kernel.org>
Signed-off-by: Li Chen <chenl311@chinatelecom.cn>
---
 fs/ext4/ext4.h        | 20 +++++++++++++++++
 fs/ext4/fast_commit.c | 50 ++++++++++++++++---------------------------
 2 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 94283a991e5c..6569d1d575a0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2000,6 +2000,8 @@ EXT4_INODE_BIT_FNS(flag, flags, 0)
 static inline int ext4_test_inode_state(struct inode *inode, int bit);
 static inline void ext4_set_inode_state(struct inode *inode, int bit);
 static inline void ext4_clear_inode_state(struct inode *inode, int bit);
+static inline unsigned long *ext4_inode_state_wait_word(struct inode *inode);
+static inline int ext4_inode_state_wait_bit(int bit);
 #if (BITS_PER_LONG < 64)
 EXT4_INODE_BIT_FNS(state, state_flags, 0)
 
@@ -2015,6 +2017,24 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 	/* We depend on the fact that callers will set i_flags */
 }
 #endif
+
+static inline unsigned long *ext4_inode_state_wait_word(struct inode *inode)
+{
+#if (BITS_PER_LONG < 64)
+	return &EXT4_I(inode)->i_state_flags;
+#else
+	return &EXT4_I(inode)->i_flags;
+#endif
+}
+
+static inline int ext4_inode_state_wait_bit(int bit)
+{
+#if (BITS_PER_LONG < 64)
+	return bit;
+#else
+	return bit + 32;
+#endif
+}
 #else
 /* Assume that user mode programs are passing in an ext4fs superblock, not
  * a kernel struct super_block.  This will allow us to call the feature-test
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index b3c22636251d..1775bce9649a 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -239,6 +239,8 @@ void ext4_fc_del(struct inode *inode)
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct ext4_fc_dentry_update *fc_dentry;
 	wait_queue_head_t *wq;
+	unsigned long *wait_word = ext4_inode_state_wait_word(inode);
+	int wait_bit = ext4_inode_state_wait_bit(EXT4_STATE_FC_FLUSHING_DATA);
 	int alloc_ctx;
 
 	if (ext4_fc_disabled(inode->i_sb))
@@ -268,17 +270,9 @@ void ext4_fc_del(struct inode *inode)
 	WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)
 		&& !ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE));
 	while (ext4_test_inode_state(inode, EXT4_STATE_FC_FLUSHING_DATA)) {
-#if (BITS_PER_LONG < 64)
-		DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
-				EXT4_STATE_FC_FLUSHING_DATA);
-		wq = bit_waitqueue(&ei->i_state_flags,
-				   EXT4_STATE_FC_FLUSHING_DATA);
-#else
-		DEFINE_WAIT_BIT(wait, &ei->i_flags,
-				EXT4_STATE_FC_FLUSHING_DATA);
-		wq = bit_waitqueue(&ei->i_flags,
-				   EXT4_STATE_FC_FLUSHING_DATA);
-#endif
+		DEFINE_WAIT_BIT(wait, wait_word, wait_bit);
+
+		wq = bit_waitqueue(wait_word, wait_bit);
 		prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
 		if (ext4_test_inode_state(inode, EXT4_STATE_FC_FLUSHING_DATA)) {
 			ext4_fc_unlock(inode->i_sb, alloc_ctx);
@@ -542,6 +536,8 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	wait_queue_head_t *wq;
+	unsigned long *wait_word = ext4_inode_state_wait_word(inode);
+	int wait_bit = ext4_inode_state_wait_bit(EXT4_STATE_FC_COMMITTING);
 	int ret;
 
 	if (S_ISDIR(inode->i_mode))
@@ -564,17 +560,9 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
 	lockdep_assert_not_held(&ei->i_data_sem);
 
 	while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
-#if (BITS_PER_LONG < 64)
-		DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
-				EXT4_STATE_FC_COMMITTING);
-		wq = bit_waitqueue(&ei->i_state_flags,
-				   EXT4_STATE_FC_COMMITTING);
-#else
-		DEFINE_WAIT_BIT(wait, &ei->i_flags,
-				EXT4_STATE_FC_COMMITTING);
-		wq = bit_waitqueue(&ei->i_flags,
-				   EXT4_STATE_FC_COMMITTING);
-#endif
+		DEFINE_WAIT_BIT(wait, wait_word, wait_bit);
+
+		wq = bit_waitqueue(wait_word, wait_bit);
 		prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
 		if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
 			schedule();
@@ -1034,6 +1022,8 @@ static int ext4_fc_perform_commit(journal_t *journal)
 	int ret = 0;
 	u32 crc = 0;
 	int alloc_ctx;
+	int flushing_wait_bit =
+		ext4_inode_state_wait_bit(EXT4_STATE_FC_FLUSHING_DATA);
 
 	/*
 	 * Step 1: Mark all inodes on s_fc_q[MAIN] with
@@ -1059,11 +1049,8 @@ static int ext4_fc_perform_commit(journal_t *journal)
 	list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
 		ext4_clear_inode_state(&iter->vfs_inode,
 				       EXT4_STATE_FC_FLUSHING_DATA);
-#if (BITS_PER_LONG < 64)
-		wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_FLUSHING_DATA);
-#else
-		wake_up_bit(&iter->i_flags, EXT4_STATE_FC_FLUSHING_DATA);
-#endif
+		wake_up_bit(ext4_inode_state_wait_word(&iter->vfs_inode),
+			    flushing_wait_bit);
 	}
 
 	/*
@@ -1279,6 +1266,8 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
 	struct ext4_inode_info *ei;
 	struct ext4_fc_dentry_update *fc_dentry;
 	int alloc_ctx;
+	int committing_wait_bit =
+		ext4_inode_state_wait_bit(EXT4_STATE_FC_COMMITTING);
 
 	if (full && sbi->s_fc_bh)
 		sbi->s_fc_bh = NULL;
@@ -1315,11 +1304,8 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
 		 * barrier in prepare_to_wait() in ext4_fc_track_inode().
 		 */
 		smp_mb();
-#if (BITS_PER_LONG < 64)
-		wake_up_bit(&ei->i_state_flags, EXT4_STATE_FC_COMMITTING);
-#else
-		wake_up_bit(&ei->i_flags, EXT4_STATE_FC_COMMITTING);
-#endif
+		wake_up_bit(ext4_inode_state_wait_word(&ei->vfs_inode),
+			    committing_wait_bit);
 	}
 
 	while (!list_empty(&sbi->s_fc_dentry_q[FC_Q_MAIN])) {
-- 
2.53.0


^ permalink raw reply related

* [PATCH v2] jbd2: fix integer underflow in jbd2_journal_initialize_fast_commit()
From: Junrui Luo @ 2026-05-13  9:28 UTC (permalink / raw)
  To: Theodore Ts'o, Jan Kara, Harshad Shirwadkar
  Cc: linux-ext4, linux-kernel, Yuhao Jiang, stable, Junrui Luo
In-Reply-To: <SYBPR01MB78813DD23B28BD49B1AA1123AF392@SYBPR01MB7881.ausprd01.prod.outlook.com>

jbd2_journal_initialize_fast_commit() validates journal capacity by
checking (journal->j_last - num_fc_blks < JBD2_MIN_JOURNAL_BLOCKS).
Both j_last and num_fc_blks are unsigned, so when num_fc_blks exceeds
j_last the subtraction wraps to a large value, bypassing the bounds
check.

The resulting underflow corrupts j_last, j_fc_first, and j_free,
leading to journal abort.

Fix by checking num_fc_blks against j_last before the subtraction,
returning -EFSCORRUPTED.

Fixes: 6866d7b3f2bb ("ext4 / jbd2: add fast commit initialization")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
---
Changes in v2:
- Return -EFSCORRUPTED instead of -ENOSPC
- Link to v1: https://lore.kernel.org/all/SYBPR01MB78813DD23B28BD49B1AA1123AF392@SYBPR01MB7881.ausprd01.prod.outlook.com/
---
 fs/jbd2/journal.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index cb2c529a8f1b..0bb97459fbf0 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2263,6 +2263,8 @@ jbd2_journal_initialize_fast_commit(journal_t *journal)
 	unsigned long long num_fc_blks;
 
 	num_fc_blks = jbd2_journal_get_num_fc_blks(sb);
+	if (num_fc_blks > journal->j_last)
+		return -EFSCORRUPTED;
 	if (journal->j_last - num_fc_blks < JBD2_MIN_JOURNAL_BLOCKS)
 		return -ENOSPC;
 

---
base-commit: 7aaa8047eafd0bd628065b15757d9b48c5f9c07d
change-id: 20260513-fixes-e6dcda3273d4

Best regards,
-- 
Junrui Luo <moonafterrain@outlook.com>


^ permalink raw reply related

* Re: [PATCH 6/9] fat: Fix possibly missing inode write on fsync(2)
From: Jan Kara @ 2026-05-13  9:41 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Jan Kara, linux-fsdevel, Christian Brauner, aivazian.tigran,
	Ted Tso, linux-ext4
In-Reply-To: <877bp8yang.fsf@mail.parknet.co.jp>

On Tue 12-05-26 23:17:55, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> I didn't check the case of rename completely, just recalled it when I
> >> saw this code, need confirm/check.  But at least, the case of remove
> >> will leave it even after the block is reused.
> >
> > Right. fat_detach() should set i_metadata_bhs.inode_blk to INVALID_BLK,
> > thanks for catching that. I was thinking whether we should set
> > i_metadata_bhs.inode_blk in fat_attach() instead of during inode dirtying.
> > It would be somewhat more obviously correct but it could lead to
> > unnecessary flushing in case the directory block gets dirtied by some other
> > entry in it while the inode we are fsyncing got never dirtied. IMHO that's
> > a sensible tradeoff so I'd do that but what is your opinion?
> 
> IMO, the marker should be cleared like b_assoc_buffers or I_DIRTY_*
> flags after each sync. Otherwise, because the block is shared with other
> inodes, it would sync/wait the unrelated dirty easily.

Well, even if we do that, we should still clear inode_blk in fat_detach()
AFAICT as nobody has to sync the inode before unlinking it.

Regarding clearing of inode_blk in mmb_sync() - yes, that makes sense could
be done although then we have to be careful about races of mmb_sync() with
.write_inode so that always guarantee mmb_sync() after .write_inode will
persist the buffer. I'll see how complex that will get.

> [And more serious implementation, looks like it should be cleared at
> similar points or such with b_assoc_buffers is cleared to minimize
> unrelated sync/wait.]

OTOH this doesn't really make much sense. We need to handle b_assoc_buffers
when bh is getting reclaimed so that we can free the bh. We deliberately
track inode block number and not bh pointer in mapping_metadata_bhs so that
bhs backing inodes can be freed independently as it's infeasible to track
down all mapping_metadata_bhs structs that might be referencing this bh.

And yes, I'm aware that in some corner cases the simple tracking can result
in mmb_sync() writing out inode buffer although the inode itself was
already persisted but exact tracking is way too expensive and not worth it
for simple filesystems using this infrastructure. If we implement clearing
of inode_blk in mmb_sync(), then we'll write out inode buffer unnecessarily
at most once after the inode gets dirty which is IMO a reasonable middle
ground between complexity of the tracking and unnecessary writeback.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH] ext4: fix fast commit wait/wake bit mapping on 64-bit
From: Jan Kara @ 2026-05-13  9:45 UTC (permalink / raw)
  To: Li Chen
  Cc: Theodore Ts'o, Andreas Dilger, Baokun Li, Jan Kara,
	Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, linux-ext4, linux-kernel,
	Sashiko AI review
In-Reply-To: <20260513085818.552432-1-me@linux.beauty>

On Wed 13-05-26 16:58:17, Li Chen wrote:
> On 64-bit, ext4 dynamic inode states live in the upper half of i_flags,
> and ext4_test_inode_state() applies the corresponding +32 offset.
> 
> The fast-commit wait and wake paths open-coded the wait key with the raw
> EXT4_STATE_* value. Add small helpers for the state wait word and bit,
> and use them for the FC_COMMITTING and FC_FLUSHING_DATA waits so the wait
> key follows the same mapping as the state helpers.
> 
> Fixes: 857d32f26181 ("ext4: rework fast commit commit path")
> Reported-by: Sashiko AI review <sashiko-bot@kernel.org>
> Signed-off-by: Li Chen <chenl311@chinatelecom.cn>

Nice cleanup and a good spotting. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/ext4.h        | 20 +++++++++++++++++
>  fs/ext4/fast_commit.c | 50 ++++++++++++++++---------------------------
>  2 files changed, 38 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 94283a991e5c..6569d1d575a0 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2000,6 +2000,8 @@ EXT4_INODE_BIT_FNS(flag, flags, 0)
>  static inline int ext4_test_inode_state(struct inode *inode, int bit);
>  static inline void ext4_set_inode_state(struct inode *inode, int bit);
>  static inline void ext4_clear_inode_state(struct inode *inode, int bit);
> +static inline unsigned long *ext4_inode_state_wait_word(struct inode *inode);
> +static inline int ext4_inode_state_wait_bit(int bit);
>  #if (BITS_PER_LONG < 64)
>  EXT4_INODE_BIT_FNS(state, state_flags, 0)
>  
> @@ -2015,6 +2017,24 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
>  	/* We depend on the fact that callers will set i_flags */
>  }
>  #endif
> +
> +static inline unsigned long *ext4_inode_state_wait_word(struct inode *inode)
> +{
> +#if (BITS_PER_LONG < 64)
> +	return &EXT4_I(inode)->i_state_flags;
> +#else
> +	return &EXT4_I(inode)->i_flags;
> +#endif
> +}
> +
> +static inline int ext4_inode_state_wait_bit(int bit)
> +{
> +#if (BITS_PER_LONG < 64)
> +	return bit;
> +#else
> +	return bit + 32;
> +#endif
> +}
>  #else
>  /* Assume that user mode programs are passing in an ext4fs superblock, not
>   * a kernel struct super_block.  This will allow us to call the feature-test
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index b3c22636251d..1775bce9649a 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -239,6 +239,8 @@ void ext4_fc_del(struct inode *inode)
>  	struct ext4_inode_info *ei = EXT4_I(inode);
>  	struct ext4_fc_dentry_update *fc_dentry;
>  	wait_queue_head_t *wq;
> +	unsigned long *wait_word = ext4_inode_state_wait_word(inode);
> +	int wait_bit = ext4_inode_state_wait_bit(EXT4_STATE_FC_FLUSHING_DATA);
>  	int alloc_ctx;
>  
>  	if (ext4_fc_disabled(inode->i_sb))
> @@ -268,17 +270,9 @@ void ext4_fc_del(struct inode *inode)
>  	WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)
>  		&& !ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE));
>  	while (ext4_test_inode_state(inode, EXT4_STATE_FC_FLUSHING_DATA)) {
> -#if (BITS_PER_LONG < 64)
> -		DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> -				EXT4_STATE_FC_FLUSHING_DATA);
> -		wq = bit_waitqueue(&ei->i_state_flags,
> -				   EXT4_STATE_FC_FLUSHING_DATA);
> -#else
> -		DEFINE_WAIT_BIT(wait, &ei->i_flags,
> -				EXT4_STATE_FC_FLUSHING_DATA);
> -		wq = bit_waitqueue(&ei->i_flags,
> -				   EXT4_STATE_FC_FLUSHING_DATA);
> -#endif
> +		DEFINE_WAIT_BIT(wait, wait_word, wait_bit);
> +
> +		wq = bit_waitqueue(wait_word, wait_bit);
>  		prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
>  		if (ext4_test_inode_state(inode, EXT4_STATE_FC_FLUSHING_DATA)) {
>  			ext4_fc_unlock(inode->i_sb, alloc_ctx);
> @@ -542,6 +536,8 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
>  {
>  	struct ext4_inode_info *ei = EXT4_I(inode);
>  	wait_queue_head_t *wq;
> +	unsigned long *wait_word = ext4_inode_state_wait_word(inode);
> +	int wait_bit = ext4_inode_state_wait_bit(EXT4_STATE_FC_COMMITTING);
>  	int ret;
>  
>  	if (S_ISDIR(inode->i_mode))
> @@ -564,17 +560,9 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
>  	lockdep_assert_not_held(&ei->i_data_sem);
>  
>  	while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
> -#if (BITS_PER_LONG < 64)
> -		DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> -				EXT4_STATE_FC_COMMITTING);
> -		wq = bit_waitqueue(&ei->i_state_flags,
> -				   EXT4_STATE_FC_COMMITTING);
> -#else
> -		DEFINE_WAIT_BIT(wait, &ei->i_flags,
> -				EXT4_STATE_FC_COMMITTING);
> -		wq = bit_waitqueue(&ei->i_flags,
> -				   EXT4_STATE_FC_COMMITTING);
> -#endif
> +		DEFINE_WAIT_BIT(wait, wait_word, wait_bit);
> +
> +		wq = bit_waitqueue(wait_word, wait_bit);
>  		prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
>  		if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
>  			schedule();
> @@ -1034,6 +1022,8 @@ static int ext4_fc_perform_commit(journal_t *journal)
>  	int ret = 0;
>  	u32 crc = 0;
>  	int alloc_ctx;
> +	int flushing_wait_bit =
> +		ext4_inode_state_wait_bit(EXT4_STATE_FC_FLUSHING_DATA);
>  
>  	/*
>  	 * Step 1: Mark all inodes on s_fc_q[MAIN] with
> @@ -1059,11 +1049,8 @@ static int ext4_fc_perform_commit(journal_t *journal)
>  	list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
>  		ext4_clear_inode_state(&iter->vfs_inode,
>  				       EXT4_STATE_FC_FLUSHING_DATA);
> -#if (BITS_PER_LONG < 64)
> -		wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_FLUSHING_DATA);
> -#else
> -		wake_up_bit(&iter->i_flags, EXT4_STATE_FC_FLUSHING_DATA);
> -#endif
> +		wake_up_bit(ext4_inode_state_wait_word(&iter->vfs_inode),
> +			    flushing_wait_bit);
>  	}
>  
>  	/*
> @@ -1279,6 +1266,8 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>  	struct ext4_inode_info *ei;
>  	struct ext4_fc_dentry_update *fc_dentry;
>  	int alloc_ctx;
> +	int committing_wait_bit =
> +		ext4_inode_state_wait_bit(EXT4_STATE_FC_COMMITTING);
>  
>  	if (full && sbi->s_fc_bh)
>  		sbi->s_fc_bh = NULL;
> @@ -1315,11 +1304,8 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>  		 * barrier in prepare_to_wait() in ext4_fc_track_inode().
>  		 */
>  		smp_mb();
> -#if (BITS_PER_LONG < 64)
> -		wake_up_bit(&ei->i_state_flags, EXT4_STATE_FC_COMMITTING);
> -#else
> -		wake_up_bit(&ei->i_flags, EXT4_STATE_FC_COMMITTING);
> -#endif
> +		wake_up_bit(ext4_inode_state_wait_word(&ei->vfs_inode),
> +			    committing_wait_bit);
>  	}
>  
>  	while (!list_empty(&sbi->s_fc_dentry_q[FC_Q_MAIN])) {
> -- 
> 2.53.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH v2] jbd2: fix integer underflow in jbd2_journal_initialize_fast_commit()
From: Jan Kara @ 2026-05-13  9:47 UTC (permalink / raw)
  To: Junrui Luo
  Cc: Theodore Ts'o, Jan Kara, Harshad Shirwadkar, linux-ext4,
	linux-kernel, Yuhao Jiang, stable
In-Reply-To: <SYBPR01MB7881663C927DE9D7BBF4D1DFAF062@SYBPR01MB7881.ausprd01.prod.outlook.com>

On Wed 13-05-26 17:28:40, Junrui Luo wrote:
> jbd2_journal_initialize_fast_commit() validates journal capacity by
> checking (journal->j_last - num_fc_blks < JBD2_MIN_JOURNAL_BLOCKS).
> Both j_last and num_fc_blks are unsigned, so when num_fc_blks exceeds
> j_last the subtraction wraps to a large value, bypassing the bounds
> check.
> 
> The resulting underflow corrupts j_last, j_fc_first, and j_free,
> leading to journal abort.
> 
> Fix by checking num_fc_blks against j_last before the subtraction,
> returning -EFSCORRUPTED.
> 
> Fixes: 6866d7b3f2bb ("ext4 / jbd2: add fast commit initialization")
> Reported-by: Yuhao Jiang <danisjiang@gmail.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Junrui Luo <moonafterrain@outlook.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> Changes in v2:
> - Return -EFSCORRUPTED instead of -ENOSPC
> - Link to v1: https://lore.kernel.org/all/SYBPR01MB78813DD23B28BD49B1AA1123AF392@SYBPR01MB7881.ausprd01.prod.outlook.com/
> ---
>  fs/jbd2/journal.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index cb2c529a8f1b..0bb97459fbf0 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2263,6 +2263,8 @@ jbd2_journal_initialize_fast_commit(journal_t *journal)
>  	unsigned long long num_fc_blks;
>  
>  	num_fc_blks = jbd2_journal_get_num_fc_blks(sb);
> +	if (num_fc_blks > journal->j_last)
> +		return -EFSCORRUPTED;
>  	if (journal->j_last - num_fc_blks < JBD2_MIN_JOURNAL_BLOCKS)
>  		return -ENOSPC;
>  
> 
> ---
> base-commit: 7aaa8047eafd0bd628065b15757d9b48c5f9c07d
> change-id: 20260513-fixes-e6dcda3273d4
> 
> Best regards,
> -- 
> Junrui Luo <moonafterrain@outlook.com>
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH 9/9] ext4: Use mmb infrastructure for inode buffer writeout
From: Jan Kara @ 2026-05-13 10:45 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, linux-fsdevel, aivazian.tigran, OGAWA Hirofumi, Ted Tso,
	linux-ext4
In-Reply-To: <20260511-paletten-kekse-7dc3bc394633@brauner>

On Mon 11-05-26 15:30:34, Christian Brauner wrote:
> On Mon, May 11, 2026 at 02:13:59PM +0200, Jan Kara wrote:
> > Use mmb inode buffer writeout infrastructure to reliably write out
> > inode's inode table block on fsync(2) in nojournal mode (from
> > ext4_sync_parent() and ext4_fsync_nojournal()). This significantly
> > simplifies the code as we don't have to explicitely handle inode buffer
> > writeback in ext4_write_inode() and thus we can also remove
> > sync_inode_metadata() calls from ext4_sync_parent() and
> > ext4_write_inode() call from ext4_fsync_nojournal().
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>

...

> > @@ -6348,7 +6330,11 @@ int ext4_mark_iloc_dirty(handle_t *handle,
> >  
> >  	/* the do_update_inode consumes one bh->b_count */
> >  	get_bh(iloc->bh);
> > -
> > +	if (!ext4_handle_valid(handle)) {
> > +		if (!EXT4_I(inode)->i_metadata_bhs)
> > +			ext4_inode_attach_mmb(inode);
> > +		EXT4_I(inode)->i_metadata_bhs->inode_blk = iloc->bh->b_blocknr;
> 
> The series is great overall. The only thing I think we should change is
> that we should hide this
> 
> EXT4_I(inode)->i_metadata_bhs->inode_blk = iloc->bh->b_blocknr;
> 
> behind a dedicated static inline/regular function call instead of
> open-coding it everywhere. Can then also be paired with some
> VFS_WARN_ON_ONCE() to detect garbage bh->b_blocknr.

Good point. I've created mmb_mark_inode_buffer_dirty() helper for this
matching mmb_mark_buffer_dirty() we use for standard metadata buffers. It
now also handles dirtying the buffer and synchronization with mmb_sync()
clearing the inode_blk.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH v2] jbd2: fix integer underflow in jbd2_journal_initialize_fast_commit()
From: Zhang Yi @ 2026-05-13 11:08 UTC (permalink / raw)
  To: Junrui Luo, Theodore Ts'o, Jan Kara, Harshad Shirwadkar
  Cc: linux-ext4, linux-kernel, Yuhao Jiang, stable
In-Reply-To: <SYBPR01MB7881663C927DE9D7BBF4D1DFAF062@SYBPR01MB7881.ausprd01.prod.outlook.com>

On 5/13/2026 5:28 PM, Junrui Luo wrote:
> jbd2_journal_initialize_fast_commit() validates journal capacity by
> checking (journal->j_last - num_fc_blks < JBD2_MIN_JOURNAL_BLOCKS).
> Both j_last and num_fc_blks are unsigned, so when num_fc_blks exceeds
> j_last the subtraction wraps to a large value, bypassing the bounds
> check.
> 
> The resulting underflow corrupts j_last, j_fc_first, and j_free,
> leading to journal abort.
> 
> Fix by checking num_fc_blks against j_last before the subtraction,
> returning -EFSCORRUPTED.
> 
> Fixes: 6866d7b3f2bb ("ext4 / jbd2: add fast commit initialization")
> Reported-by: Yuhao Jiang <danisjiang@gmail.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Junrui Luo <moonafterrain@outlook.com>

Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
> Changes in v2:
> - Return -EFSCORRUPTED instead of -ENOSPC
> - Link to v1: https://lore.kernel.org/all/SYBPR01MB78813DD23B28BD49B1AA1123AF392@SYBPR01MB7881.ausprd01.prod.outlook.com/
> ---
>  fs/jbd2/journal.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index cb2c529a8f1b..0bb97459fbf0 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2263,6 +2263,8 @@ jbd2_journal_initialize_fast_commit(journal_t *journal)
>  	unsigned long long num_fc_blks;
>  
>  	num_fc_blks = jbd2_journal_get_num_fc_blks(sb);
> +	if (num_fc_blks > journal->j_last)
> +		return -EFSCORRUPTED;
>  	if (journal->j_last - num_fc_blks < JBD2_MIN_JOURNAL_BLOCKS)
>  		return -ENOSPC;
>  
> 
> ---
> base-commit: 7aaa8047eafd0bd628065b15757d9b48c5f9c07d
> change-id: 20260513-fixes-e6dcda3273d4
> 
> Best regards,


^ permalink raw reply

* Re: [PATCH 08/12] swap,iomap: simplify iomap_swapfile_iter
From: Darrick J. Wong @ 2026-05-13 14:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Chris Li, Kairui Song, Christian Brauner,
	Jens Axboe, David Sterba, Theodore Ts'o, Jaegeuk Kim, Chao Yu,
	Trond Myklebust, Anna Schumaker, Namjae Jeon, Hyunchul Lee,
	Steve French, Paulo Alcantara, Carlos Maiolino, Damien Le Moal,
	Naohiro Aota, linux-xfs, linux-fsdevel, linux-doc, linux-mm,
	linux-block, linux-btrfs, linux-ext4, linux-f2fs-devel, linux-nfs,
	linux-cifs
In-Reply-To: <20260513065608.GA2250@lst.de>

On Wed, May 13, 2026 at 08:56:08AM +0200, Christoph Hellwig wrote:
> On Tue, May 12, 2026 at 10:02:04AM -0700, Darrick J. Wong wrote:
> > OH.  Now I remember why -- it's to handle contiguous mixed mappings
> > better.
> > 
> > Let's say that you have a 1k fsblock filesystem and 4k base pages.  You
> > fallocate an 8G swap file and then mkswap it.  The first mapping is a 1k
> > written mapping at offset 0 for the swap header, followed by an 8388607k
> > unwritten mapping at offset 3k.
> > 
> > The PAGE_SIZE rounding code in iomap_swapfile_add_extent will round the
> > end of that first mapping down to zero and ignore it.  The second
> > mapping will be treated as if it were a 8388604k mapping starting at
> > offset 4096.  Now the page counts are wrong and the swapon fails.
> 
> Do we care about this use case?  I guess you did as you implemented
> his, but still?

We do, because mkswap -F uses fallocate nowadays:

$ mkswap -s 4194304 -F a
Setting up swapspace version 1, size = 4 MiB (4190208 bytes)
no label, UUID=bc9746bf-e200-4944-927c-80d83872f1cb
$ filefrag -v a
Filesystem type is: 58465342
File size of a is 4194304 (1024 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       0:  411383552.. 411383552:      1:            
   1:        1..    1023:  411383553.. 411384575:   1023:             last,unwritten,eof
a: 1 extent found

> > A more generic solution to this would be to change add_swap_extent to
> > take sector_t addr and length values and use them to construct a bitmap
> > representing contiguous physical space on the bdev, accounting of course
> > for PAGE_SIZE alignment.  Except for the swap header page, every other
> > contiguously set page-aligned region in the bitmap gets added to the
> > swap extent map.
> 
> You don't even need a bitmap, just do basically the same checks as
> the iomap code when moving to a new swap extent after moving to use
> the sector_t.  And it really should anyway, as the current abuse of
> sector_t to store a disk offset in PAGE_SIZE units is pretty gross.

Oh, I meant this to handle the particularly gross case where the fsblock
size is smaller than a base page, but there are a very large number of
file mappings that point to a physically contiguous extent but are not
in logical order:

{.offset=0, .length=1k, .addr=7},
{.offset=1, .length=1k, .addr=6},
{.offset=2, .length=1k, .addr=5},
{.offset=3, .length=1k, .addr=4},
{.offset=4, .length=1k, .addr=3},
{.offset=5, .length=1k, .addr=2},
{.offset=6, .length=1k, .addr=1},
{.offset=7, .length=1k, .addr=0},

That's two pages of swapfile, but with the current layout accumulation
code we "cannot" find either.

--D

^ permalink raw reply

* Re: [RFC v7 6/7] ext4: fast commit: add lock_updates tracepoint
From: Steven Rostedt @ 2026-05-13 17:57 UTC (permalink / raw)
  To: Li Chen
  Cc: Zhang Yi, Theodore Ts'o, Andreas Dilger, Baokun Li, Jan Kara,
	Ojaswin Mujoo, Ritesh Harjani (IBM), Zhang Yi, Masami Hiramatsu,
	Mathieu Desnoyers, linux-ext4, linux-kernel, linux-trace-kernel
In-Reply-To: <20260511084304.1559557-7-me@linux.beauty>

On Mon, 11 May 2026 16:43:01 +0800
Li Chen <me@linux.beauty> wrote:

> @@ -1346,8 +1383,15 @@ static int ext4_fc_perform_commit(journal_t *journal)
>  	}
>  	ext4_fc_unlock(sb, alloc_ctx);
>  
> -	ret = ext4_fc_snapshot_inodes(journal, inodes, inodes_size);
> +	ret = ext4_fc_snapshot_inodes(journal, inodes, inodes_size,
> +				      &snap_inodes, &snap_ranges, &snap_err);
>  	jbd2_journal_unlock_updates(journal);
> +	if (trace_ext4_fc_lock_updates_enabled()) {
> +		locked_ns = ktime_to_ns(ktime_sub(ktime_get(), lock_start));
> +		trace_ext4_fc_lock_updates(sb, commit_tid, locked_ns,
> +					   snap_inodes, snap_ranges, ret,
> +					   snap_err);

Please change this to:

		trace_call__ext4_fc_lock_updates(...)

As the "trace_ext4_fc_lock_updates_enabled()" already has the static
branch. No need to do it twice anymore. 7.1 introduced the
"trace_call__foo()" that will do a direct call to the tracepoints
registered, without the need for another static branch.

-- Steve


> +	}

^ permalink raw reply

* Re: improve the swap_activate interface
From: Steve French @ 2026-05-13 20:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Chris Li, Kairui Song, Christian Brauner,
	Darrick J . Wong, Jens Axboe, David Sterba, Theodore Ts'o,
	Jaegeuk Kim, Chao Yu, Trond Myklebust, Anna Schumaker,
	Namjae Jeon, Hyunchul Lee, Steve French, Paulo Alcantara,
	Carlos Maiolino, Damien Le Moal, Naohiro Aota, linux-xfs,
	linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
	linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512053625.2950900-1-hch@lst.de>

I just tried this on 7.1-rc3 with the swap patches (full kernel build,
on Ubuntu 25,10) and boot failed with out of memory which I had never
seen before.  Any idea how to workaround this with the swap patch
series, or is there a fix for this in the swap series already?

On Tue, May 12, 2026 at 12:41 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Hi all,
>
> Darrick recently posted iomap support for fuse-iomap, which was trivial
> but a bit ugly, which triggered me into looking how this could be done
> in a cleaner way.  The result of that is this fairly big series that
> reworks how the MM code calls into the file system to activate swap
> files to make it much cleaner and easier to use.
>
> I've tested this with swap devices manually, and using the swap tests
> in xfstests on btrfs, ext3, ext4, f2fs and xfs to exercise the different
> implementation.  Out of those all passed, but f2fs actually notruns all
> tests even in the baseline as it requires special preparation for
> swapfiles which never got wired up in xfstests.
>
> Diffstat:
>  Documentation/filesystems/iomap/operations.rst |    3
>  Documentation/filesystems/locking.rst          |   35 +--
>  Documentation/filesystems/vfs.rst              |   40 ++--
>  block/fops.c                                   |   15 +
>  fs/btrfs/btrfs_inode.h                         |    3
>  fs/btrfs/file.c                                |    4
>  fs/btrfs/inode.c                               |   72 -------
>  fs/ext4/file.c                                 |    6
>  fs/ext4/inode.c                                |   11 -
>  fs/f2fs/data.c                                 |   50 -----
>  fs/f2fs/f2fs.h                                 |    2
>  fs/f2fs/file.c                                 |    4
>  fs/iomap/swapfile.c                            |  165 +++---------------
>  fs/nfs/direct.c                                |    1
>  fs/nfs/file.c                                  |   21 --
>  fs/nfs/nfs4file.c                              |    3
>  fs/ntfs/aops.c                                 |    8
>  fs/ntfs/file.c                                 |    6
>  fs/smb/client/cifsfs.c                         |   18 +
>  fs/smb/client/cifsfs.h                         |    3
>  fs/smb/client/file.c                           |   16 -
>  fs/xfs/xfs_aops.c                              |   48 -----
>  fs/xfs/xfs_file.c                              |   39 ++++
>  fs/zonefs/file.c                               |   30 +--
>  include/linux/fs.h                             |   11 -
>  include/linux/iomap.h                          |    5
>  include/linux/nfs_fs.h                         |    3
>  include/linux/swap.h                           |  129 +-------------
>  mm/page_io.c                                   |   45 ----
>  mm/swap.h                                      |   92 ++++++++++
>  mm/swapfile.c                                  |  227 ++++++++++++++-----------
>  31 files changed, 471 insertions(+), 644 deletions(-)
>


-- 
Thanks,

Steve

^ permalink raw reply

* [syzbot] [ext4?] BUG: sleeping function called from invalid context in mempool_alloc_noprof
From: syzbot @ 2026-05-14  3:27 UTC (permalink / raw)
  To: adilger.kernel, jack, libaokun, linux-ext4, linux-kernel,
	linux-usb, ojaswin, ritesh.list, syzkaller-bugs, tytso, yi.zhang

Hello,

syzbot found the following issue on:

HEAD commit:    25bd55f46032 usb: udc: pxa: remove unused platform_data
git tree:       https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing
console output: https://syzkaller.appspot.com/x/log.txt?x=16e2ead2580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=afc495310dffaa7c
dashboard link: https://syzkaller.appspot.com/bug?extid=9fc0caf33cb36845f9b9
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/df4cd244b684/disk-25bd55f4.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/bccb34371b4c/vmlinux-25bd55f4.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d86b0bd5ea58/bzImage-25bd55f4.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9fc0caf33cb36845f9b9@syzkaller.appspotmail.com

BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:323
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 50, name: kworker/u8:4
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
4 locks held by kworker/u8:4/50:
 #0: ffff888100e9d140 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x12d6/0x1980 kernel/workqueue.c:3277
 #1: ffffc90000537d18 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x973/0x1980 kernel/workqueue.c:3278
 #2: ffff8881012bc0d8 (&type->s_umount_key#33){.+.+}-{4:4}, at: super_trylock_shared+0x1e/0xf0 fs/super.c:565
 #3: ffffffff896de8e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
 #3: ffffffff896de8e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
 #3: ffffffff896de8e0 (rcu_read_lock){....}-{1:3}, at: unlocked_inode_to_wb_begin include/linux/backing-dev.h:290 [inline]
 #3: ffffffff896de8e0 (rcu_read_lock){....}-{1:3}, at: folio_clear_dirty_for_io+0x1eb/0x7f0 mm/page-writeback.c:2919
CPU: 0 UID: 0 PID: 50 Comm: kworker/u8:4 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Workqueue: writeback wb_workfn (flush-8:0)
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
 __might_resched.cold+0x1ec/0x232 kernel/sched/core.c:9162
 might_alloc include/linux/sched/mm.h:323 [inline]
 might_alloc include/linux/sched/mm.h:315 [inline]
 mempool_alloc_noprof+0x220/0x310 mm/mempool.c:558
 bio_alloc_bioset+0x8d5/0x1050 block/bio.c:594
 bio_alloc include/linux/bio.h:367 [inline]
 submit_bh_wbc+0x250/0x710 fs/buffer.c:2716
 __block_write_full_folio+0x77f/0xee0 fs/buffer.c:1830
 block_write_full_folio+0x3b5/0x4e0 fs/buffer.c:2650
 blkdev_writepages+0xc7/0x150 block/fops.c:486
 do_writepages+0x278/0x600 mm/page-writeback.c:2575
 __writeback_single_inode+0x164/0x1350 fs/fs-writeback.c:1764
 writeback_sb_inodes+0x766/0x1c60 fs/fs-writeback.c:2056
 __writeback_inodes_wb+0xf8/0x2d0 fs/fs-writeback.c:2132
 wb_writeback+0x720/0xb90 fs/fs-writeback.c:2243
 wb_check_old_data_flush fs/fs-writeback.c:2347 [inline]
 wb_do_writeback fs/fs-writeback.c:2400 [inline]
 wb_workfn+0x8dd/0xc00 fs/fs-writeback.c:2428
 process_one_work+0xa0e/0x1980 kernel/workqueue.c:3302
 process_scheduled_works kernel/workqueue.c:3385 [inline]
 worker_thread+0x5ef/0xe50 kernel/workqueue.c:3466
 kthread+0x370/0x450 kernel/kthread.c:436
 ret_from_fork+0x69a/0xc80 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
BUG: workqueue leaked atomic, lock or RCU: kworker/u8:4[50]
     preempt=0x00000000 lock=0->1 RCU=0->1 workfn=wb_workfn
1 lock held by kworker/u8:4/50:
 #0: 
ffffffff896de8e0
 (
rcu_read_lock
){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
){....}-{1:3}, at: unlocked_inode_to_wb_begin include/linux/backing-dev.h:290 [inline]
){....}-{1:3}, at: folio_clear_dirty_for_io+0x1eb/0x7f0 mm/page-writeback.c:2919
CPU: 0 UID: 0 PID: 50 Comm: kworker/u8:4 Tainted: G        W           syzkaller #0 PREEMPT(full) 
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Workqueue: writeback wb_workfn (flush-8:0)
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
 process_one_work.cold+0x127/0x306 kernel/workqueue.c:3323
 process_scheduled_works kernel/workqueue.c:3385 [inline]
 worker_thread+0x5ef/0xe50 kernel/workqueue.c:3466
 kthread+0x370/0x450 kernel/kthread.c:436
 ret_from_fork+0x69a/0xc80 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>

=============================
[ BUG: Invalid wait context ]
syzkaller #0 Tainted: G        W          
-----------------------------
kworker/u8:4/50 is trying to lock:
ffff88811bb071d0 (&ei->i_data_sem){++++}-{4:4}, at: ext4_map_blocks+0x45a/0xd30 fs/ext4/inode.c:823
other info that might help us debug this:
context-{5:5}
4 locks held by kworker/u8:4/50:
 #0: ffffffff896de8e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
 #0: ffffffff896de8e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
 #0: ffffffff896de8e0 (rcu_read_lock){....}-{1:3}, at: unlocked_inode_to_wb_begin include/linux/backing-dev.h:290 [inline]
 #0: ffffffff896de8e0 (rcu_read_lock){....}-{1:3}, at: folio_clear_dirty_for_io+0x1eb/0x7f0 mm/page-writeback.c:2919
 #1: ffff888113550940 ((wq_completion)ext4-rsv-conversion){+.+.}-{0:0}, at: process_one_work+0x12d6/0x1980 kernel/workqueue.c:3277
 #2: ffffc90000537d18 ((work_completion)(&ei->i_rsv_conversion_work)){+.+.}-{0:0}, at: process_one_work+0x973/0x1980 kernel/workqueue.c:3278
 #3: ffff888116262938 (jbd2_handle){.+.+}-{0:0}, at: start_this_handle+0xfaa/0x13a0 fs/jbd2/transaction.c:444
stack backtrace:
CPU: 0 UID: 0 PID: 50 Comm: kworker/u8:4 Tainted: G        W           syzkaller #0 PREEMPT(full) 
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
 print_lock_invalid_wait_context kernel/locking/lockdep.c:4830 [inline]
 check_wait_context kernel/locking/lockdep.c:4902 [inline]
 __lock_acquire+0xfa4/0x2630 kernel/locking/lockdep.c:5187
 lock_acquire kernel/locking/lockdep.c:5868 [inline]
 lock_acquire+0x1b1/0x370 kernel/locking/lockdep.c:5825
 down_write+0x8b/0x1f0 kernel/locking/rwsem.c:1625
 ext4_map_blocks+0x45a/0xd30 fs/ext4/inode.c:823
 ext4_convert_unwritten_extents+0x2a6/0x4d0 fs/ext4/extents.c:5067
 ext4_convert_unwritten_io_end_vec+0x121/0x280 fs/ext4/extents.c:5107
 ext4_end_io_end+0xd3/0x4b0 fs/ext4/page-io.c:199
 ext4_do_flush_completed_IO fs/ext4/page-io.c:290 [inline]
 ext4_end_io_rsv_work+0x205/0x380 fs/ext4/page-io.c:305
 process_one_work+0xa0e/0x1980 kernel/workqueue.c:3302
 process_scheduled_works kernel/workqueue.c:3385 [inline]
 worker_thread+0x5ef/0xe50 kernel/workqueue.c:3466
 kthread+0x370/0x450 kernel/kthread.c:436
 ret_from_fork+0x69a/0xc80 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:323
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 50, name: kworker/u8:4
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
INFO: lockdep is turned off.
CPU: 0 UID: 0 PID: 50 Comm: kworker/u8:4 Tainted: G        W           syzkaller #0 PREEMPT(full) 
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
 __might_resched.cold+0x1ec/0x232 kernel/sched/core.c:9162
 might_alloc include/linux/sched/mm.h:323 [inline]
 slab_pre_alloc_hook mm/slub.c:4520 [inline]
 slab_alloc_node mm/slub.c:4875 [inline]
 __do_kmalloc_node mm/slub.c:5294 [inline]
 __kmalloc_noprof+0x55e/0x810 mm/slub.c:5307
 kmalloc_noprof include/linux/slab.h:954 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 ext4_find_extent+0x21b/0xa30 fs/ext4/extents.c:918
 ext4_ext_map_blocks+0x20a/0x5930 fs/ext4/extents.c:4286
 ext4_map_create_blocks+0xec/0x5e0 fs/ext4/inode.c:631
 ext4_map_blocks+0x46b/0xd30 fs/ext4/inode.c:824
 ext4_convert_unwritten_extents+0x2a6/0x4d0 fs/ext4/extents.c:5067
 ext4_convert_unwritten_io_end_vec+0x121/0x280 fs/ext4/extents.c:5107
 ext4_end_io_end+0xd3/0x4b0 fs/ext4/page-io.c:199
 ext4_do_flush_completed_IO fs/ext4/page-io.c:290 [inline]
 ext4_end_io_rsv_work+0x205/0x380 fs/ext4/page-io.c:305
 process_one_work+0xa0e/0x1980 kernel/workqueue.c:3302
 process_scheduled_works kernel/workqueue.c:3385 [inline]
 worker_thread+0x5ef/0xe50 kernel/workqueue.c:3466
 kthread+0x370/0x450 kernel/kthread.c:436
 ret_from_fork+0x69a/0xc80 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
BUG: workqueue leaked atomic, lock or RCU: kworker/u8:4[50]
     preempt=0x00000000 lock=1->0 RCU=1->1 workfn=ext4_end_io_rsv_work
INFO: lockdep is turned off.
CPU: 0 UID: 0 PID: 50 Comm: kworker/u8:4 Tainted: G        W           syzkaller #0 PREEMPT(full) 
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
 process_one_work.cold+0x127/0x306 kernel/workqueue.c:3323
 process_scheduled_works kernel/workqueue.c:3385 [inline]
 worker_thread+0x5ef/0xe50 kernel/workqueue.c:3466
 kthread+0x370/0x450 kernel/kthread.c:436
 ret_from_fork+0x69a/0xc80 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
------------[ cut here ]------------
Voluntary context switch within RCU read-side critical section!
WARNING: kernel/rcu/tree_plugin.h:332 at rcu_note_context_switch+0x859/0x19c0 kernel/rcu/tree_plugin.h:332, CPU#0: kworker/u8:4/50
Modules linked in:
CPU: 0 UID: 0 PID: 50 Comm: kworker/u8:4 Tainted: G        W           syzkaller #0 PREEMPT(full) 
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026
Workqueue:  0x0 (ext4-rsv-conversion)
RIP: 0010:rcu_note_context_switch+0x859/0x19c0 kernel/rcu/tree_plugin.h:332
Code: c1 ea 03 80 3c 02 00 0f 85 9b 0b 00 00 48 8b 53 28 b9 01 00 00 00 4c 89 ef e8 a3 cf fe ff e9 1d f9 ff ff 48 8d 3d 27 29 59 09 <67> 48 0f b9 3a e9 99 f8 ff ff 48 b8 00 00 00 00 00 fc ff df 48 8d
RSP: 0018:ffffc90000537c10 EFLAGS: 00010002
RAX: 0000000000000001 RBX: ffff8881f563a540 RCX: ffffffff81987a21
RDX: 0000000000000000 RSI: ffffffff87b08ce0 RDI: ffffffff8af21770
RBP: ffff888103eb8000 R08: 0000000000000000 R09: fffffbfff15e10da
R10: ffffffff8af086d7 R11: 0000000000000001 R12: 0000000000000000
R13: ffff888103eb847c R14: ffffffff8cf91680 R15: ffffffff8af09664
FS:  0000000000000000(0000) GS:ffff8882686a8000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055d53982e008 CR3: 0000000117564000 CR4: 00000000003506f0
Call Trace:
 <TASK>
 __schedule+0x25e/0x4840 kernel/sched/core.c:7043
 __schedule_loop kernel/sched/core.c:7267 [inline]
 schedule+0xdd/0x390 kernel/sched/core.c:7282
 worker_thread+0x53b/0xe50 kernel/workqueue.c:3481
 kthread+0x370/0x450 kernel/kthread.c:436
 ret_from_fork+0x69a/0xc80 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>
----------------
Code disassembly (best guess):
   0:	c1 ea 03             	shr    $0x3,%edx
   3:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)
   7:	0f 85 9b 0b 00 00    	jne    0xba8
   d:	48 8b 53 28          	mov    0x28(%rbx),%rdx
  11:	b9 01 00 00 00       	mov    $0x1,%ecx
  16:	4c 89 ef             	mov    %r13,%rdi
  19:	e8 a3 cf fe ff       	call   0xfffecfc1
  1e:	e9 1d f9 ff ff       	jmp    0xfffff940
  23:	48 8d 3d 27 29 59 09 	lea    0x9592927(%rip),%rdi        # 0x9592951
* 2a:	67 48 0f b9 3a       	ud1    (%edx),%rdi <-- trapping instruction
  2f:	e9 99 f8 ff ff       	jmp    0xfffff8cd
  34:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  3b:	fc ff df
  3e:	48                   	rex.W
  3f:	8d                   	.byte 0x8d


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* Re: [PATCH RFC 01/17] lib/crc: add crc32c_flip_range() for incremental CRC update
From: Eric Biggers @ 2026-05-14  3:52 UTC (permalink / raw)
  To: Baokun Li
  Cc: linux-ext4, linux-crypto, ardb, tytso, adilger.kernel, jack,
	yi.zhang, ojaswin, ritesh.list
In-Reply-To: <20260508121539.4174601-2-libaokun@linux.alibaba.com>

On Fri, May 08, 2026 at 08:15:23PM +0800, Baokun Li wrote:
> When a contiguous range of bits in a buffer is flipped, the CRC32c
> checksum can be updated incrementally without re-scanning the entire
> buffer, by exploiting the linearity of CRCs over GF(2):
> 
>   New_CRC = Old_CRC ^ CRC(flip_mask << trailing_bits)
> 
> Introduce crc32c_flip_range() which computes this delta using
> precomputed GF(2) shift matrices and nibble-indexed lookup tables.
> The implementation decomposes nbits and trailing_bits into
> power-of-2 components and combines them via the CRC concatenation
> property:
> 
>   CRC(A || B) = shift(CRC(A), len(B)) ^ CRC(B)
> 
> This gives O(log N) complexity with only ~9.8KB of static tables
> (fits in L1 cache).  The current maximum supported buffer size is
> 64KB (INCR_MAX_ORDER = 19, i.e. 2^19 bits = 524288 bits = 64KB).

It will be a little while before I can do a full review of this, but
just a high-level comment: "only ~9.8KB of static tables (fits in L1
cache)" isn't ideal.  Large tables tend to microbenchmark well, then
have worse real-world performance due to lots of other things contending
for the L1 cache.

Another consideration is that basically every Linux kernel has
CONFIG_CRC32 enabled, regardless of whether they would actually find
this new functionality useful.

I'm not necessarily saying this should be its own option, especially if
it's useful for ext4 even in the non-LBS case.  But I do think it would
be nice if it could be a bit smaller and more memory-optimized.

Anyway, I'll look into the algorithm more when I have time.

- Eric

^ permalink raw reply

* [PATCH 0/4] iomap: trivial fixes for ext4 conversion
From: Zhang Yi @ 2026-05-14  6:29 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs
  Cc: linux-ext4, brauner, djwong, hch, yi.zhang, yi.zhang, yizhang089,
	yangerkun, yukuai

From: Zhang Yi <yi.zhang@huawei.com>

This patch series contains a few trivial iomap-related fixes in
preparation for converting ext4 buffered I/O to use iomap. 

The first three patches are taken from my ext4 conversion series [1], as
suggested by Christoph. The last patch fixes a bug originally reported
by Sashiko during review of my series; although unrelated to the ext4
conversion, it is worth fixing on its own. Please see the following
patches for detail.

Thanks,
Yi.

[1] https://lore.kernel.org/linux-ext4/20260511072344.191271-1-yi.zhang@huaweicloud.com/

Zhang Yi (4):
  iomap: correct the range of a partial dirty clear
  iomap: support invalidating partial folios
  iomap: fix incorrect did_zero setting in iomap_zero_iter()
  iomap: fix out-of-bounds bitmap_set() with zero-length range

 fs/iomap/buffered-io.c | 45 +++++++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 14 deletions(-)

-- 
2.52.0

^ permalink raw reply

* [PATCH 4/4] iomap: fix out-of-bounds bitmap_set() with zero-length range
From: Zhang Yi @ 2026-05-14  6:29 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs
  Cc: linux-ext4, brauner, djwong, hch, yi.zhang, yi.zhang, yizhang089,
	yangerkun, yukuai
In-Reply-To: <20260514062955.1183976-1-yi.zhang@huaweicloud.com>

From: Zhang Yi <yi.zhang@huawei.com>

ifs_set_range_dirty() and ifs_set_range_uptodate() compute last_blk
as (off + len - 1) >> i_blkbits.  When off is 0 and len is 0, the
unsigned subtraction underflows to SIZE_MAX, producing a huge
last_blk and nr_blks value that causes bitmap_set() to write far
beyond the ifs->state allocation.

Regarding ifs_set_range_uptodate(), it is temporarily safe because len
cannot be passed in as 0. However, for ifs_set_range_dirty() this is
reachable from __iomap_write_end(): when copy_folio_from_iter_atomic()
returns 0 (e.g. user buffer fault) and the folio is already uptodate,
the guard at the top of __iomap_write_end() does not trigger because
!folio_test_uptodate() is false, and iomap_set_range_dirty() is called
with copied == 0.

Add a !len guard to both functions before the computation, so that a
zero-length range is a no-op.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/iomap/buffered-io.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 27ab33edbdee..6fe5f7e998fd 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -67,11 +67,14 @@ static bool ifs_set_range_uptodate(struct folio *folio,
 		struct iomap_folio_state *ifs, size_t off, size_t len)
 {
 	struct inode *inode = folio->mapping->host;
-	unsigned int first_blk = off >> inode->i_blkbits;
-	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
-	unsigned int nr_blks = last_blk - first_blk + 1;
+	unsigned int first_blk, last_blk;
 
-	bitmap_set(ifs->state, first_blk, nr_blks);
+	if (!len)
+		return true;
+
+	first_blk = off >> inode->i_blkbits;
+	last_blk = (off + len - 1) >> inode->i_blkbits;
+	bitmap_set(ifs->state, first_blk, last_blk - first_blk + 1);
 	return ifs_is_fully_uptodate(folio, ifs);
 }
 
@@ -203,13 +206,17 @@ static void ifs_set_range_dirty(struct folio *folio,
 {
 	struct inode *inode = folio->mapping->host;
 	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
-	unsigned int first_blk = (off >> inode->i_blkbits);
-	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
-	unsigned int nr_blks = last_blk - first_blk + 1;
+	unsigned int first_blk, last_blk;
 	unsigned long flags;
 
+	if (!len)
+		return;
+
+	first_blk = off >> inode->i_blkbits;
+	last_blk = (off + len - 1) >> inode->i_blkbits;
 	spin_lock_irqsave(&ifs->state_lock, flags);
-	bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks);
+	bitmap_set(ifs->state, first_blk + blks_per_folio,
+		   last_blk - first_blk + 1);
 	spin_unlock_irqrestore(&ifs->state_lock, flags);
 }
 
-- 
2.52.0


^ permalink raw reply related

* [PATCH 3/4] iomap: fix incorrect did_zero setting in iomap_zero_iter()
From: Zhang Yi @ 2026-05-14  6:29 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs
  Cc: linux-ext4, brauner, djwong, hch, yi.zhang, yi.zhang, yizhang089,
	yangerkun, yukuai
In-Reply-To: <20260514062955.1183976-1-yi.zhang@huaweicloud.com>

From: Zhang Yi <yi.zhang@huawei.com>

The did_zero output parameter was unconditionally set after the loop,
which is incorrect. It should only be set when the zeroing operation
actually completes, not when IOMAP_F_STALE is set or when
IOMAP_F_FOLIO_BATCH is set but !folio causes the loop to break early,
or when iomap_iter_advance() returns an error.

This causes did_zero to be incorrectly set when zeroing a clean
unwritten extent because the loop exits early without actually zeroing
any data.

Fix it by using a local variable to track whether any folio was actually
zeroed, and only set did_zero after the loop if zeroing happened.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
This is taken from:
 https://lore.kernel.org/linux-fsdevel/20260310082250.3535486-1-yi.zhang@huaweicloud.com/
No changes.

 fs/iomap/buffered-io.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 876c2f507f58..27ab33edbdee 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1542,6 +1542,7 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 		const struct iomap_write_ops *write_ops)
 {
 	u64 bytes = iomap_length(iter);
+	bool zeroed = false;
 	int status;
 
 	do {
@@ -1560,6 +1561,8 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 		/* a NULL folio means we're done with a folio batch */
 		if (!folio) {
 			status = iomap_iter_advance_full(iter);
+			if (status)
+				return status;
 			break;
 		}
 
@@ -1570,6 +1573,7 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 				bytes);
 
 		folio_zero_range(folio, offset, bytes);
+		zeroed = true;
 		folio_mark_accessed(folio);
 
 		ret = iomap_write_end(iter, bytes, bytes, folio);
@@ -1579,10 +1583,10 @@ static int iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
 
 		status = iomap_iter_advance(iter, bytes);
 		if (status)
-			break;
+			return status;
 	} while ((bytes = iomap_length(iter)) > 0);
 
-	if (did_zero)
+	if (did_zero && zeroed)
 		*did_zero = true;
 	return status;
 }
-- 
2.52.0


^ permalink raw reply related

* [PATCH 1/4] iomap: correct the range of a partial dirty clear
From: Zhang Yi @ 2026-05-14  6:29 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs
  Cc: linux-ext4, brauner, djwong, hch, yi.zhang, yi.zhang, yizhang089,
	yangerkun, yukuai
In-Reply-To: <20260514062955.1183976-1-yi.zhang@huaweicloud.com>

From: Zhang Yi <yi.zhang@huawei.com>

The block range calculation in ifs_clear_range_dirty() is incorrect when
partially clearing a range in a folio. We cannot clear the dirty bit of
the first block or the last block if the start or end offset is not
blocksize-aligned. This has not yet caused any issues since we always
clear a whole folio in iomap_writeback_folio().

Fix this by rounding up the first block to blocksize alignment, and
calculate the last block by rounding down (using truncation). Correct
the nr_blks calculation accordingly.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
This is modified from:
 https://lore.kernel.org/linux-fsdevel/20240812121159.3775074-2-yi.zhang@huaweicloud.com/
Changes:
 - Use round_up() instead of DIV_ROUND_UP() to prevent wasted integer
   division.

 fs/iomap/buffered-io.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d7b648421a70..64351a448a8b 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -176,13 +176,17 @@ static void ifs_clear_range_dirty(struct folio *folio,
 {
 	struct inode *inode = folio->mapping->host;
 	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
-	unsigned int first_blk = (off >> inode->i_blkbits);
-	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
-	unsigned int nr_blks = last_blk - first_blk + 1;
+	unsigned int first_blk = round_up(off, i_blocksize(inode)) >>
+				 inode->i_blkbits;
+	unsigned int last_blk = (off + len) >> inode->i_blkbits;
 	unsigned long flags;
 
+	if (first_blk >= last_blk)
+		return;
+
 	spin_lock_irqsave(&ifs->state_lock, flags);
-	bitmap_clear(ifs->state, first_blk + blks_per_folio, nr_blks);
+	bitmap_clear(ifs->state, first_blk + blks_per_folio,
+		     last_blk - first_blk);
 	spin_unlock_irqrestore(&ifs->state_lock, flags);
 }
 
-- 
2.52.0


^ permalink raw reply related

* [PATCH 2/4] iomap: support invalidating partial folios
From: Zhang Yi @ 2026-05-14  6:29 UTC (permalink / raw)
  To: linux-fsdevel, linux-xfs
  Cc: linux-ext4, brauner, djwong, hch, yi.zhang, yi.zhang, yizhang089,
	yangerkun, yukuai
In-Reply-To: <20260514062955.1183976-1-yi.zhang@huaweicloud.com>

From: Zhang Yi <yi.zhang@huawei.com>

Current iomap_invalidate_folio() can only invalidate an entire folio. If
we truncate a partial folio on a filesystem where the block size is
smaller than the folio size, it will leave behind dirty bits for the
truncated or punched blocks. During the write-back process, it will
attempt to map the invalid hole range. Fortunately, this has not caused
any real problems so far because the ->writeback_range() function
corrects the length.

However, the implementation of FALLOC_FL_ZERO_RANGE in ext4 depends on
the support for invalidating partial folios. When ext4 partially zeroes
out a dirty and unwritten folio, it does not perform a flush first like
XFS. Therefore, if the dirty bits of the corresponding area cannot be
cleared, the zeroed area after writeback remains in the written state
rather than reverting to the unwritten state. Fix this by supporting
invalidation of partial folios.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
This is taken from:
 https://lore.kernel.org/linux-fsdevel/20240812121159.3775074-3-yi.zhang@huaweicloud.com/
No code changes, only update the commit message to explain why Ext4
needs this.

 fs/iomap/buffered-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 64351a448a8b..876c2f507f58 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -761,6 +761,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len)
 		WARN_ON_ONCE(folio_test_writeback(folio));
 		folio_cancel_dirty(folio);
 		ifs_free(folio);
+	} else {
+		iomap_clear_range_dirty(folio, offset, len);
 	}
 }
 EXPORT_SYMBOL_GPL(iomap_invalidate_folio);
-- 
2.52.0

^ permalink raw reply related

* Re: [RFC v7 4/7] ext4: fast commit: avoid self-deadlock in inode snapshotting
From: Li Chen @ 2026-05-14  7:28 UTC (permalink / raw)
  To: sashiko; +Cc: tytso, Linux ext4
In-Reply-To: <20260512034640.71DA1C2BCB0@smtp.kernel.org>

Hi sashiko,

 ---- On Tue, 12 May 2026 11:46:39 +0800   <sashiko-bot@kernel.org> wrote --- 
 > Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
 > - [Critical] Removing igrab() exposes a critical use-after-free vulnerability because ext4_fc_del() can miss the EXT4_STATE_FC_COMMITTING state transition and free the inode while it is actively being snapshotted.
 > --
 > 
 > commit 39c5450f50154240ff764cf3a63e011d8e973fd4
 > Author: Li Chen <me@linux.beauty>
 > 
 > ext4: fast commit: avoid self-deadlock in inode snapshotting
 > 
 > This commit removes igrab()/iput() during fast commit snapshotting, relying
 > instead on the EXT4_STATE_FC_COMMITTING flag to keep inodes pinned while
 > their pointers are stored in the inodes[] array for snapshotting.
 > 
 > > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
 > > index 273bf34031aeb..f9bb18c0b5497 100644
 > > --- a/fs/ext4/fast_commit.c
 > > +++ b/fs/ext4/fast_commit.c
 > > @@ -1195,13 +1195,12 @@ static int ext4_fc_snapshot_inodes(journal_t *journal)
 > >  
 > >      alloc_ctx = ext4_fc_lock(sb);
 > >      list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
 > > -        inodes[i] = igrab(&iter->vfs_inode);
 > > -        if (inodes[i])
 > > -            i++;
 > > +        inodes[i++] = &iter->vfs_inode;
 > >      }
 > 
 > Is it possible for this change to cause a use-after-free of the inode?
 > 
 > If ext4_fc_del() is called during inode eviction, it waits sequentially for
 > EXT4_STATE_FC_COMMITTING and then EXT4_STATE_FC_FLUSHING_DATA. However, it
 > doesn't appear to loop back to re-check EXT4_STATE_FC_COMMITTING after
 > waiting for the flush bit.
 > 
 > Could a sequence like this occur?
 > 
 > 1. ext4_fc_perform_commit() sets EXT4_STATE_FC_FLUSHING_DATA.
 > 2. Eviction calls ext4_fc_del(), which sees EXT4_STATE_FC_COMMITTING is
 >    false, but EXT4_STATE_FC_FLUSHING_DATA is true, so it waits on the
 >    flush bit.
 > 3. ext4_fc_perform_commit() clears EXT4_STATE_FC_FLUSHING_DATA, waking up
 >    ext4_fc_del().
 > 4. Before ext4_fc_del() resumes, ext4_fc_perform_commit() sets
 >    EXT4_STATE_FC_COMMITTING and saves the unpinned inode pointer into the
 >    inodes[] array.
 > 5. ext4_fc_del() resumes, skips re-checking EXT4_STATE_FC_COMMITTING,
 >    removes the inode from the fast commit list, and returns.
 > 6. The VFS completely frees the inode.
 > 7. ext4_fc_snapshot_inodes() iterates over the inodes[] array and passes
 >    the now dangling pointer to ext4_fc_snapshot_inode().
 > 
 > Does ext4_fc_del() need to re-verify the state flags after waking up, or
 > is there another mechanism preventing the inode from being freed before
 > ext4_fc_snapshot_inode() accesses it?
 > 
 > [ ... ]
 > 
 > -- 
 > Sashiko AI review · https://sashiko.dev/#/patchset/20260511084304.1559557-1-me@linux.beauty?part=4
 > 

Yes, I think you're right. I missed that window.

I'll make ext4_fc_del() loop back after waiting on FC_FLUSHING_DATA, so it checks FC_COMMITTING again
under s_fc_lock before removing the inode from the FC lists.

Regards,
Li


^ permalink raw reply

* Re: [RFC v7 6/7] ext4: fast commit: add lock_updates tracepoint
From: Li Chen @ 2026-05-14 11:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Zhang Yi, Theodore Ts'o, Andreas Dilger, Baokun Li, Jan Kara,
	Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, Masami Hiramatsu,
	Mathieu Desnoyers, linux-ext4, linux-kernel, linux-trace-kernel
In-Reply-To: <20260513135741.12ddb97d@gandalf.local.home>

Hi Steven,

 ---- On Thu, 14 May 2026 01:57:41 +0800  Steven Rostedt <rostedt@goodmis.org> wrote --- 
 > On Mon, 11 May 2026 16:43:01 +0800
 > Li Chen <me@linux.beauty> wrote:
 > 
 > > @@ -1346,8 +1383,15 @@ static int ext4_fc_perform_commit(journal_t *journal)
 > >      }
 > >      ext4_fc_unlock(sb, alloc_ctx);
 > >  
 > > -    ret = ext4_fc_snapshot_inodes(journal, inodes, inodes_size);
 > > +    ret = ext4_fc_snapshot_inodes(journal, inodes, inodes_size,
 > > +                      &snap_inodes, &snap_ranges, &snap_err);
 > >      jbd2_journal_unlock_updates(journal);
 > > +    if (trace_ext4_fc_lock_updates_enabled()) {
 > > +        locked_ns = ktime_to_ns(ktime_sub(ktime_get(), lock_start));
 > > +        trace_ext4_fc_lock_updates(sb, commit_tid, locked_ns,
 > > +                       snap_inodes, snap_ranges, ret,
 > > +                       snap_err);
 > 
 > Please change this to:
 > 
 >         trace_call__ext4_fc_lock_updates(...)
 > 
 > As the "trace_ext4_fc_lock_updates_enabled()" already has the static
 > branch. No need to do it twice anymore. 7.1 introduced the
 > "trace_call__foo()" that will do a direct call to the tracepoints
 > registered, without the need for another static branch.

Thanks, will do it.


Regards,
Li


^ permalink raw reply

* Re: [PATCH] ext4: fix fast commit wait/wake bit mapping on 64-bit
From: Zhang Yi @ 2026-05-14 12:00 UTC (permalink / raw)
  To: Li Chen, Theodore Ts'o
  Cc: Andreas Dilger, Baokun Li, Jan Kara, Ojaswin Mujoo,
	Ritesh Harjani, linux-ext4, linux-kernel, Sashiko AI review
In-Reply-To: <20260513085818.552432-1-me@linux.beauty>

On 5/13/2026 4:58 PM, Li Chen wrote:
> On 64-bit, ext4 dynamic inode states live in the upper half of i_flags,
> and ext4_test_inode_state() applies the corresponding +32 offset.
> 
> The fast-commit wait and wake paths open-coded the wait key with the raw
> EXT4_STATE_* value. Add small helpers for the state wait word and bit,
> and use them for the FC_COMMITTING and FC_FLUSHING_DATA waits so the wait
> key follows the same mapping as the state helpers.
> 
> Fixes: 857d32f26181 ("ext4: rework fast commit commit path")
> Reported-by: Sashiko AI review <sashiko-bot@kernel.org>
> Signed-off-by: Li Chen <chenl311@chinatelecom.cn>

Ha, This looks good to me! Thanks.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/ext4.h        | 20 +++++++++++++++++
>  fs/ext4/fast_commit.c | 50 ++++++++++++++++---------------------------
>  2 files changed, 38 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 94283a991e5c..6569d1d575a0 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2000,6 +2000,8 @@ EXT4_INODE_BIT_FNS(flag, flags, 0)
>  static inline int ext4_test_inode_state(struct inode *inode, int bit);
>  static inline void ext4_set_inode_state(struct inode *inode, int bit);
>  static inline void ext4_clear_inode_state(struct inode *inode, int bit);
> +static inline unsigned long *ext4_inode_state_wait_word(struct inode *inode);
> +static inline int ext4_inode_state_wait_bit(int bit);
>  #if (BITS_PER_LONG < 64)
>  EXT4_INODE_BIT_FNS(state, state_flags, 0)
>  
> @@ -2015,6 +2017,24 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
>  	/* We depend on the fact that callers will set i_flags */
>  }
>  #endif
> +
> +static inline unsigned long *ext4_inode_state_wait_word(struct inode *inode)
> +{
> +#if (BITS_PER_LONG < 64)
> +	return &EXT4_I(inode)->i_state_flags;
> +#else
> +	return &EXT4_I(inode)->i_flags;
> +#endif
> +}
> +
> +static inline int ext4_inode_state_wait_bit(int bit)
> +{
> +#if (BITS_PER_LONG < 64)
> +	return bit;
> +#else
> +	return bit + 32;
> +#endif
> +}
>  #else
>  /* Assume that user mode programs are passing in an ext4fs superblock, not
>   * a kernel struct super_block.  This will allow us to call the feature-test
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index b3c22636251d..1775bce9649a 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -239,6 +239,8 @@ void ext4_fc_del(struct inode *inode)
>  	struct ext4_inode_info *ei = EXT4_I(inode);
>  	struct ext4_fc_dentry_update *fc_dentry;
>  	wait_queue_head_t *wq;
> +	unsigned long *wait_word = ext4_inode_state_wait_word(inode);
> +	int wait_bit = ext4_inode_state_wait_bit(EXT4_STATE_FC_FLUSHING_DATA);
>  	int alloc_ctx;
>  
>  	if (ext4_fc_disabled(inode->i_sb))
> @@ -268,17 +270,9 @@ void ext4_fc_del(struct inode *inode)
>  	WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)
>  		&& !ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE));
>  	while (ext4_test_inode_state(inode, EXT4_STATE_FC_FLUSHING_DATA)) {
> -#if (BITS_PER_LONG < 64)
> -		DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> -				EXT4_STATE_FC_FLUSHING_DATA);
> -		wq = bit_waitqueue(&ei->i_state_flags,
> -				   EXT4_STATE_FC_FLUSHING_DATA);
> -#else
> -		DEFINE_WAIT_BIT(wait, &ei->i_flags,
> -				EXT4_STATE_FC_FLUSHING_DATA);
> -		wq = bit_waitqueue(&ei->i_flags,
> -				   EXT4_STATE_FC_FLUSHING_DATA);
> -#endif
> +		DEFINE_WAIT_BIT(wait, wait_word, wait_bit);
> +
> +		wq = bit_waitqueue(wait_word, wait_bit);
>  		prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
>  		if (ext4_test_inode_state(inode, EXT4_STATE_FC_FLUSHING_DATA)) {
>  			ext4_fc_unlock(inode->i_sb, alloc_ctx);
> @@ -542,6 +536,8 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
>  {
>  	struct ext4_inode_info *ei = EXT4_I(inode);
>  	wait_queue_head_t *wq;
> +	unsigned long *wait_word = ext4_inode_state_wait_word(inode);
> +	int wait_bit = ext4_inode_state_wait_bit(EXT4_STATE_FC_COMMITTING);
>  	int ret;
>  
>  	if (S_ISDIR(inode->i_mode))
> @@ -564,17 +560,9 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
>  	lockdep_assert_not_held(&ei->i_data_sem);
>  
>  	while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
> -#if (BITS_PER_LONG < 64)
> -		DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> -				EXT4_STATE_FC_COMMITTING);
> -		wq = bit_waitqueue(&ei->i_state_flags,
> -				   EXT4_STATE_FC_COMMITTING);
> -#else
> -		DEFINE_WAIT_BIT(wait, &ei->i_flags,
> -				EXT4_STATE_FC_COMMITTING);
> -		wq = bit_waitqueue(&ei->i_flags,
> -				   EXT4_STATE_FC_COMMITTING);
> -#endif
> +		DEFINE_WAIT_BIT(wait, wait_word, wait_bit);
> +
> +		wq = bit_waitqueue(wait_word, wait_bit);
>  		prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
>  		if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
>  			schedule();
> @@ -1034,6 +1022,8 @@ static int ext4_fc_perform_commit(journal_t *journal)
>  	int ret = 0;
>  	u32 crc = 0;
>  	int alloc_ctx;
> +	int flushing_wait_bit =
> +		ext4_inode_state_wait_bit(EXT4_STATE_FC_FLUSHING_DATA);
>  
>  	/*
>  	 * Step 1: Mark all inodes on s_fc_q[MAIN] with
> @@ -1059,11 +1049,8 @@ static int ext4_fc_perform_commit(journal_t *journal)
>  	list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
>  		ext4_clear_inode_state(&iter->vfs_inode,
>  				       EXT4_STATE_FC_FLUSHING_DATA);
> -#if (BITS_PER_LONG < 64)
> -		wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_FLUSHING_DATA);
> -#else
> -		wake_up_bit(&iter->i_flags, EXT4_STATE_FC_FLUSHING_DATA);
> -#endif
> +		wake_up_bit(ext4_inode_state_wait_word(&iter->vfs_inode),
> +			    flushing_wait_bit);
>  	}
>  
>  	/*
> @@ -1279,6 +1266,8 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>  	struct ext4_inode_info *ei;
>  	struct ext4_fc_dentry_update *fc_dentry;
>  	int alloc_ctx;
> +	int committing_wait_bit =
> +		ext4_inode_state_wait_bit(EXT4_STATE_FC_COMMITTING);
>  
>  	if (full && sbi->s_fc_bh)
>  		sbi->s_fc_bh = NULL;
> @@ -1315,11 +1304,8 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>  		 * barrier in prepare_to_wait() in ext4_fc_track_inode().
>  		 */
>  		smp_mb();
> -#if (BITS_PER_LONG < 64)
> -		wake_up_bit(&ei->i_state_flags, EXT4_STATE_FC_COMMITTING);
> -#else
> -		wake_up_bit(&ei->i_flags, EXT4_STATE_FC_COMMITTING);
> -#endif
> +		wake_up_bit(ext4_inode_state_wait_word(&ei->vfs_inode),
> +			    committing_wait_bit);
>  	}
>  
>  	while (!list_empty(&sbi->s_fc_dentry_q[FC_Q_MAIN])) {


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox