* [PATCH v2] ext4: drop s_writepages_rwsem around ext4_destroy_inline_data
2026-06-09 15:45 [PATCH] ext4: move inline data cleanup to ext4_writepages to fix deadlock Yun Zhou
@ 2026-06-10 5:08 ` Yun Zhou
2026-06-10 6:37 ` [PATCH v3] ext4: drop s_writepages_rwsem around inline data handling in writepages Yun Zhou
2026-06-10 8:06 ` [syzbot ci] Re: ext4: move inline data cleanup to ext4_writepages to fix deadlock syzbot ci
2 siblings, 0 replies; 4+ messages in thread
From: Yun Zhou @ 2026-06-10 5:08 UTC (permalink / raw)
To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
yi.zhang, ebiggers, yun.zhou
Cc: linux-ext4, linux-kernel
ext4_do_writepages() calls ext4_destroy_inline_data() which acquires
xattr_sem while s_writepages_rwsem is held (read). This creates a
circular lock dependency:
CPU0 CPU1
---- ----
ext4_writepages()
ext4_writepages_down_read()
[holds s_writepages_rwsem]
ext4_evict_inode()
__ext4_mark_inode_dirty()
ext4_expand_extra_isize_ea()
ext4_xattr_block_set()
[holds xattr_sem]
iput(old_bh inode)
write_inode_now()
ext4_writepages()
ext4_writepages_down_read()
[BLOCKED on s_writepages_rwsem]
ext4_do_writepages()
ext4_destroy_inline_data()
down_write(xattr_sem)
[BLOCKED on xattr_sem]
Fix by temporarily dropping s_writepages_rwsem around the call to
ext4_destroy_inline_data(). This is safe because:
- This code runs before any block mapping or IO submission, so no
writepages state depends on the rwsem being held at this point.
- Inline data destruction is a one-way format transition (once cleared,
EXT4_INODE_INLINE_DATA is never set again). The rwsem is
re-acquired immediately after, ensuring format stability for the
remainder of writepages.
- The can_map flag naturally identifies the ext4_writepages() path
(holds rwsem) vs ext4_normal_submit_inode_data_buffers() (does not),
so the drop/reacquire is skipped when the rwsem is not held.
Also check the return value of ext4_destroy_inline_data() -- previously
ignored, a failure would leave inline data intact while writepages
proceeds assuming block-mapped layout.
Reported-by: syzbot+bb2455d02bda0b5701e3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bb2455d02bda0b5701e3
Fixes: c8585c6fcaf2 ("ext4: fix races between changing inode journal mode and ext4_writepages")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
v2:
- Instead of moving inline data handling to ext4_writepages(),
temporarily drop s_writepages_rwsem around ext4_destroy_inline_data()
in ext4_do_writepages(). The move approach had a race where concurrent
writes could create dirty pages with inline data after the early check,
and unconditional destruction without dirty pages would lose data.
fs/ext4/inode.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c2c2d6ac7f3d..7ec16adf4685 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1694,6 +1694,9 @@ struct mpage_da_data {
struct writeback_control *wbc;
unsigned int can_map:1; /* Can writepages call map blocks? */
+ /* Saved memalloc context from ext4_writepages_down_read() */
+ int alloc_ctx;
+
/* These are internal state of ext4_do_writepages() */
loff_t start_pos; /* The start pos to write */
loff_t next_pos; /* Current pos to examine */
@@ -2824,8 +2827,21 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
}
BUG_ON(ext4_test_inode_state(inode,
EXT4_STATE_MAY_INLINE_DATA));
- ext4_destroy_inline_data(handle, inode);
+ /*
+ * Temporarily drop s_writepages_rwsem because
+ * ext4_destroy_inline_data() acquires xattr_sem, which has
+ * a higher lock ordering rank. Holding both would create a
+ * circular dependency with ext4_xattr_block_set() -> iput()
+ * -> ext4_writepages() -> s_writepages_rwsem.
+ */
+ if (mpd->can_map)
+ ext4_writepages_up_read(inode->i_sb, mpd->alloc_ctx);
+ ret = ext4_destroy_inline_data(handle, inode);
+ if (mpd->can_map)
+ mpd->alloc_ctx = ext4_writepages_down_read(inode->i_sb);
ext4_journal_stop(handle);
+ if (ret)
+ goto out_writepages;
}
/*
@@ -3032,13 +3048,12 @@ static int ext4_writepages(struct address_space *mapping,
.can_map = 1,
};
int ret;
- int alloc_ctx;
ret = ext4_emergency_state(sb);
if (unlikely(ret))
return ret;
- alloc_ctx = ext4_writepages_down_read(sb);
+ mpd.alloc_ctx = ext4_writepages_down_read(sb);
ret = ext4_do_writepages(&mpd);
/*
* For data=journal writeback we could have come across pages marked
@@ -3047,7 +3062,7 @@ static int ext4_writepages(struct address_space *mapping,
*/
if (!ret && mpd.journalled_more_data)
ret = ext4_do_writepages(&mpd);
- ext4_writepages_up_read(sb, alloc_ctx);
+ ext4_writepages_up_read(sb, mpd.alloc_ctx);
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH v3] ext4: drop s_writepages_rwsem around inline data handling in writepages
2026-06-09 15:45 [PATCH] ext4: move inline data cleanup to ext4_writepages to fix deadlock Yun Zhou
2026-06-10 5:08 ` [PATCH v2] ext4: drop s_writepages_rwsem around ext4_destroy_inline_data Yun Zhou
@ 2026-06-10 6:37 ` Yun Zhou
2026-06-10 8:06 ` [syzbot ci] Re: ext4: move inline data cleanup to ext4_writepages to fix deadlock syzbot ci
2 siblings, 0 replies; 4+ messages in thread
From: Yun Zhou @ 2026-06-10 6:37 UTC (permalink / raw)
To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
yi.zhang, ebiggers, yun.zhou
Cc: linux-ext4, linux-kernel
ext4_do_writepages() calls ext4_destroy_inline_data() which acquires
xattr_sem while s_writepages_rwsem is held (read). This creates a
circular lock dependency:
CPU0 CPU1
---- ----
ext4_writepages()
ext4_writepages_down_read()
[holds s_writepages_rwsem]
ext4_evict_inode()
__ext4_mark_inode_dirty()
ext4_expand_extra_isize_ea()
ext4_xattr_block_set()
[holds xattr_sem]
iput(old_bh inode)
write_inode_now()
ext4_writepages()
ext4_writepages_down_read()
[BLOCKED on s_writepages_rwsem]
ext4_do_writepages()
ext4_destroy_inline_data()
down_write(xattr_sem)
[BLOCKED on xattr_sem]
Fix by temporarily dropping s_writepages_rwsem for the entire inline
data handling block, including the journal handle start/stop. The
rwsem must be dropped before ext4_journal_start() -- not between
journal_start and journal_stop -- to avoid a secondary deadlock with
ext4_change_inode_journal_flag() which takes rwsem (write) and then
calls jbd2_journal_lock_updates() waiting for active handles to stop.
This is safe because:
- This code runs before any block mapping or IO submission, so no
writepages state depends on the rwsem being held at this point.
- Inline data destruction is a one-way format transition (once cleared,
EXT4_INODE_INLINE_DATA is never set again). The rwsem is
re-acquired after journal_stop, ensuring format stability for the
remainder of writepages.
- The can_map flag identifies the ext4_writepages() path (holds rwsem)
vs ext4_normal_submit_inode_data_buffers() (does not), so the
drop/reacquire is skipped when the rwsem is not held.
Also check the return value of ext4_destroy_inline_data() to avoid
proceeding with an inconsistent inode format on failure.
Reported-by: syzbot+bb2455d02bda0b5701e3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bb2455d02bda0b5701e3
Fixes: c8585c6fcaf2 ("ext4: fix races between changing inode journal mode and ext4_writepages")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
v3: Drop s_writepages_rwsem before ext4_journal_start() and reacquire
after ext4_journal_stop(), instead of dropping between journal_start
and journal_stop as in v2. This avoids two issues identified in v2
review:
- memalloc_nofs_restore() in ext4_writepages_up_read() would clear
PF_MEMALLOC_NOFS while the jbd2 handle is active.
- Reacquiring s_writepages_rwsem while holding a handle creates an
ABBA deadlock with ext4_change_inode_journal_flag() which takes
the rwsem (write) then calls jbd2_journal_lock_updates().
v2: Instead of moving inline data handling to ext4_writepages(),
temporarily drop s_writepages_rwsem around ext4_destroy_inline_data()
in ext4_do_writepages(). The move approach had a race where concurrent
writes could create dirty pages with inline data after the early check,
and unconditional destruction without dirty pages would lose data.
v1: Moved inline data cleanup from ext4_do_writepages() to
ext4_writepages() before acquiring s_writepages_rwsem.
fs/ext4/inode.c | 31 ++++++++++++++++++++++++++-----
1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c2c2d6ac7f3d..cd7588a3fa45 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1694,6 +1694,9 @@ struct mpage_da_data {
struct writeback_control *wbc;
unsigned int can_map:1; /* Can writepages call map blocks? */
+ /* Saved memalloc context from ext4_writepages_down_read() */
+ int alloc_ctx;
+
/* These are internal state of ext4_do_writepages() */
loff_t start_pos; /* The start pos to write */
loff_t next_pos; /* Current pos to examine */
@@ -2816,16 +2819,35 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
* we'd better clear the inline data here.
*/
if (ext4_has_inline_data(inode)) {
- /* Just inode will be modified... */
+ /*
+ * Temporarily drop s_writepages_rwsem because
+ * ext4_destroy_inline_data() acquires xattr_sem, which has
+ * a higher lock ordering rank. Holding both would create a
+ * circular dependency with ext4_xattr_block_set() -> iput()
+ * -> ext4_writepages() -> s_writepages_rwsem.
+ *
+ * Drop the rwsem before starting the journal handle to also
+ * avoid a deadlock with ext4_change_inode_journal_flag(),
+ * which takes rwsem (write) then jbd2_journal_lock_updates().
+ */
+ if (mpd->can_map)
+ ext4_writepages_up_read(inode->i_sb, mpd->alloc_ctx);
handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
if (IS_ERR(handle)) {
+ if (mpd->can_map)
+ mpd->alloc_ctx =
+ ext4_writepages_down_read(inode->i_sb);
ret = PTR_ERR(handle);
goto out_writepages;
}
BUG_ON(ext4_test_inode_state(inode,
EXT4_STATE_MAY_INLINE_DATA));
- ext4_destroy_inline_data(handle, inode);
+ ret = ext4_destroy_inline_data(handle, inode);
ext4_journal_stop(handle);
+ if (mpd->can_map)
+ mpd->alloc_ctx = ext4_writepages_down_read(inode->i_sb);
+ if (ret)
+ goto out_writepages;
}
/*
@@ -3032,13 +3054,12 @@ static int ext4_writepages(struct address_space *mapping,
.can_map = 1,
};
int ret;
- int alloc_ctx;
ret = ext4_emergency_state(sb);
if (unlikely(ret))
return ret;
- alloc_ctx = ext4_writepages_down_read(sb);
+ mpd.alloc_ctx = ext4_writepages_down_read(sb);
ret = ext4_do_writepages(&mpd);
/*
* For data=journal writeback we could have come across pages marked
@@ -3047,7 +3068,7 @@ static int ext4_writepages(struct address_space *mapping,
*/
if (!ret && mpd.journalled_more_data)
ret = ext4_do_writepages(&mpd);
- ext4_writepages_up_read(sb, alloc_ctx);
+ ext4_writepages_up_read(sb, mpd.alloc_ctx);
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* [syzbot ci] Re: ext4: move inline data cleanup to ext4_writepages to fix deadlock
2026-06-09 15:45 [PATCH] ext4: move inline data cleanup to ext4_writepages to fix deadlock Yun Zhou
2026-06-10 5:08 ` [PATCH v2] ext4: drop s_writepages_rwsem around ext4_destroy_inline_data Yun Zhou
2026-06-10 6:37 ` [PATCH v3] ext4: drop s_writepages_rwsem around inline data handling in writepages Yun Zhou
@ 2026-06-10 8:06 ` syzbot ci
2 siblings, 0 replies; 4+ messages in thread
From: syzbot ci @ 2026-06-10 8:06 UTC (permalink / raw)
To: adilger.kernel, daeho.jeong, jack, libaokun, linux-ext4,
linux-kernel, ojaswin, ritesh.list, tytso, yi.zhang, yun.zhou
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v1] ext4: move inline data cleanup to ext4_writepages to fix deadlock
https://lore.kernel.org/all/20260609154505.2104659-1-yun.zhou@windriver.com
* [PATCH] ext4: move inline data cleanup to ext4_writepages to fix deadlock
and found the following issue:
kernel BUG in ext4_writepages
Full report is available here:
https://ci.syzbot.org/series/1ede6029-df2a-4e08-bffc-05540c1f4934
***
kernel BUG in ext4_writepages
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: 2d3090a8aeb596a26935db0955d46c9a5db5c6ce
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/63ee0324-2d17-4b32-aca2-c6230ff64be6/config
syz repro: https://ci.syzbot.org/findings/676a447c-ea73-43ea-9949-054dac1961e5/syz_repro
EXT4-fs warning (device loop2): ext4_expand_extra_isize_ea:2860: Unable to expand inode 15. Delete some EAs or run e2fsck.
------------[ cut here ]------------
kernel BUG at fs/ext4/inode.c:3047!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 5875 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:ext4_writepages+0x622/0x630 fs/ext4/inode.c:3046
Code: ff e9 61 fc ff ff 44 89 f1 80 e1 07 80 c1 03 38 c1 0f 8c de fc ff ff 4c 89 f7 e8 f9 2f a8 ff e9 d1 fc ff ff e8 ef d7 3c ff 90 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90
RSP: 0018:ffffc900034df2e0 EFLAGS: 00010293
RAX: ffffffff8288dfb1 RBX: 1ffff9200069be60 RCX: ffff888110555940
RDX: 0000000000000000 RSI: 0000004000000000 RDI: 0000000000000000
RBP: ffffc900034df410 R08: ffff8881b48c2f0f R09: 1ffff110369185e1
R10: dffffc0000000000 R11: ffffed10369185e2 R12: dffffc0000000000
R13: 0000004000000000 R14: 0000004610000000 R15: 1ffff11020c6fcc5
FS: 00007f033e9346c0(0000) GS:ffff8882a92a0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005556842f3058 CR3: 000000016d5c0000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_writepages+0x32e/0x550 mm/page-writeback.c:2571
__writeback_single_inode+0x133/0x10e0 fs/fs-writeback.c:1764
writeback_single_inode+0x4ac/0xdc0 fs/fs-writeback.c:1883
write_inode_now+0x1c2/0x290 fs/fs-writeback.c:2974
iput_final fs/inode.c:1950 [inline]
iput+0x8c1/0xe80 fs/inode.c:2009
ext4_orphan_cleanup+0xc38/0x1470 fs/ext4/orphan.c:472
__ext4_fill_super fs/ext4/super.c:5701 [inline]
ext4_fill_super+0x5a19/0x6330 fs/ext4/super.c:5824
get_tree_bdev_flags+0x431/0x4f0 fs/super.c:1694
vfs_get_tree+0x92/0x2a0 fs/super.c:1754
fc_mount fs/namespace.c:1193 [inline]
do_new_mount_fc fs/namespace.c:3758 [inline]
do_new_mount+0x341/0xd30 fs/namespace.c:3834
do_mount fs/namespace.c:4167 [inline]
__do_sys_mount fs/namespace.c:4383 [inline]
__se_sys_mount+0x31d/0x420 fs/namespace.c:4360
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f033d99e0ca
Code: 48 c7 c2 e8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f033e933e58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007f033e933ee0 RCX: 00007f033d99e0ca
RDX: 0000200000000040 RSI: 00002000000016c0 RDI: 00007f033e933ea0
RBP: 0000200000000040 R08: 00007f033e933ee0 R09: 000000000000840e
R10: 000000000000840e R11: 0000000000000246 R12: 00002000000016c0
R13: 00007f033e933ea0 R14: 000000000000042f R15: 0000200000000080
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:ext4_writepages+0x622/0x630 fs/ext4/inode.c:3046
Code: ff e9 61 fc ff ff 44 89 f1 80 e1 07 80 c1 03 38 c1 0f 8c de fc ff ff 4c 89 f7 e8 f9 2f a8 ff e9 d1 fc ff ff e8 ef d7 3c ff 90 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90
RSP: 0018:ffffc900034df2e0 EFLAGS: 00010293
RAX: ffffffff8288dfb1 RBX: 1ffff9200069be60 RCX: ffff888110555940
RDX: 0000000000000000 RSI: 0000004000000000 RDI: 0000000000000000
RBP: ffffc900034df410 R08: ffff8881b48c2f0f R09: 1ffff110369185e1
R10: dffffc0000000000 R11: ffffed10369185e2 R12: dffffc0000000000
R13: 0000004000000000 R14: 0000004610000000 R15: 1ffff11020c6fcc5
FS: 00007f033e9346c0(0000) GS:ffff8882a92a0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005556842f3058 CR3: 000000016d5c0000 CR4: 00000000000006f0
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply [flat|nested] 4+ messages in thread