* [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount
@ 2026-05-08 6:01 Jiakai Xu
2026-05-08 9:47 ` Joseph Qi
0 siblings, 1 reply; 5+ messages in thread
From: Jiakai Xu @ 2026-05-08 6:01 UTC (permalink / raw)
To: linux-kernel, ocfs2-devel
Cc: Joel Becker, Joseph Qi, Kurt Hackel, Mark Fasheh, Jiakai Xu
A race condition exists between filesystem unmount and inode permission
operations. When ocfs2_dismount_volume() frees the ocfs2_super (osb)
structure, concurrent access via OCFS2_SB(inode->i_sb) in
ocfs2_inode_lock_full_nested() can dereference freed memory, causing a
page fault in __pv_queued_spin_lock_slowpath via
ocfs2_is_hard_readonly() -> spin_lock(&osb->osb_lock).
Fix this with two changes:
1. In ocfs2_dismount_volume(): set sb->s_fs_info = NULL before
kfree(osb), so OCFS2_SB() returns NULL instead of a dangling pointer
during the teardown race window.
2. In ocfs2_inode_lock_full_nested(): add a NULL check on osb after
OCFS2_SB(), returning -EIO if the superblock info is already gone.
This ensures the crash path is handled gracefully when the
filesystem is being torn down.
Signed-off-by: Jiakai Xu <xujiakai24@mails.ucas.ac.cn>
Fixes: ccd979bdbce9f ("OCFS2: The Second Oracle Cluster Filesystem")
---
fs/ocfs2/dlmglue.c | 3 +++
fs/ocfs2/super.c | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 7283bb2c5a31..cd619958a0a2 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2435,6 +2435,9 @@ int ocfs2_inode_lock_full_nested(struct inode *inode,
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct buffer_head *local_bh = NULL;
+ if (!osb)
+ return -EIO;
+
mlog(0, "inode %llu, take %s META lock\n",
(unsigned long long)OCFS2_I(inode)->ip_blkno,
ex ? "EXMODE" : "PRMODE");
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index b875f01c9756..3fd56638e4f0 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1881,10 +1881,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
printk(KERN_INFO "ocfs2: Unmounting device (%s) on (node %s)\n",
osb->dev_str, nodestr);
+ sb->s_fs_info = NULL;
ocfs2_delete_osb(osb);
kfree(osb);
sb->s_dev = 0;
- sb->s_fs_info = NULL;
}
static int ocfs2_setup_osb_uuid(struct ocfs2_super *osb, const unsigned char *uuid,
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount
2026-05-08 6:01 [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount Jiakai Xu
@ 2026-05-08 9:47 ` Joseph Qi
2026-05-09 4:28 ` Jiakai Xu
0 siblings, 1 reply; 5+ messages in thread
From: Joseph Qi @ 2026-05-08 9:47 UTC (permalink / raw)
To: Jiakai Xu, linux-kernel, ocfs2-devel
Cc: Joel Becker, Kurt Hackel, Mark Fasheh, Heming Zhao
On 5/8/26 2:01 PM, Jiakai Xu wrote:
> A race condition exists between filesystem unmount and inode permission
> operations. When ocfs2_dismount_volume() frees the ocfs2_super (osb)
> structure, concurrent access via OCFS2_SB(inode->i_sb) in
> ocfs2_inode_lock_full_nested() can dereference freed memory, causing a
> page fault in __pv_queued_spin_lock_slowpath via
> ocfs2_is_hard_readonly() -> spin_lock(&osb->osb_lock).
>
> Fix this with two changes:
>
> 1. In ocfs2_dismount_volume(): set sb->s_fs_info = NULL before
> kfree(osb), so OCFS2_SB() returns NULL instead of a dangling pointer
> during the teardown race window.
>
> 2. In ocfs2_inode_lock_full_nested(): add a NULL check on osb after
> OCFS2_SB(), returning -EIO if the superblock info is already gone.
> This ensures the crash path is handled gracefully when the
> filesystem is being torn down.
>
It seems this is not enough, or TOCTOU still exists. Say:
Thread A Thread B
osb = OCFS2_SB(inode->i_sb)
ocfs2_dismount_volume()
-> sb->s_fs_info = NULL
-> kfree(osb)
use freed osb
BTW, how did you find this issue?
Joseph
> Signed-off-by: Jiakai Xu <xujiakai24@mails.ucas.ac.cn>
> Fixes: ccd979bdbce9f ("OCFS2: The Second Oracle Cluster Filesystem")
> ---
> fs/ocfs2/dlmglue.c | 3 +++
> fs/ocfs2/super.c | 2 +-
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 7283bb2c5a31..cd619958a0a2 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2435,6 +2435,9 @@ int ocfs2_inode_lock_full_nested(struct inode *inode,
> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> struct buffer_head *local_bh = NULL;
>
> + if (!osb)
> + return -EIO;
> +
> mlog(0, "inode %llu, take %s META lock\n",
> (unsigned long long)OCFS2_I(inode)->ip_blkno,
> ex ? "EXMODE" : "PRMODE");
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index b875f01c9756..3fd56638e4f0 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -1881,10 +1881,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
> printk(KERN_INFO "ocfs2: Unmounting device (%s) on (node %s)\n",
> osb->dev_str, nodestr);
>
> + sb->s_fs_info = NULL;
> ocfs2_delete_osb(osb);
> kfree(osb);
> sb->s_dev = 0;
> - sb->s_fs_info = NULL;
> }
>
> static int ocfs2_setup_osb_uuid(struct ocfs2_super *osb, const unsigned char *uuid,
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount
2026-05-08 9:47 ` Joseph Qi
@ 2026-05-09 4:28 ` Jiakai Xu
2026-05-09 6:20 ` Joseph Qi
0 siblings, 1 reply; 5+ messages in thread
From: Jiakai Xu @ 2026-05-09 4:28 UTC (permalink / raw)
To: joseph.qi
Cc: heming.zhao, jlbec, kurt.hackel, linux-kernel, mark, ocfs2-devel,
xujiakai24
> It seems this is not enough, or TOCTOU still exists. Say:
>
> Thread A Thread B
> osb = OCFS2_SB(inode->i_sb)
> ocfs2_dismount_volume()
> -> sb->s_fs_info = NULL
> -> kfree(osb)
> use freed osb
>
Hi Joseph,
Thank you very much for the review! You are absolutely right about the
TOCTOU issue — simply adding a NULL check after OCFS2_SB() cannot
prevent the race where thread A reads a valid osb pointer before thread
B frees it.
> BTW, how did you find this issue?
I found this issue through fuzzing. The crash report shows a page fault
at __pv_queued_spin_lock_slowpath via the call path:
ocfs2_permission -> ocfs2_inode_lock_tracker ->
ocfs2_inode_lock_full_nested -> ocfs2_is_hard_readonly ->
spin_lock(&osb->osb_lock)
The fault address was in the kernel static data region, indicating that
the osb structure had been freed and its memory reused.
I have been thinking about a more robust fix and would like to get your
opinion on the following approach:
Currently, ocfs2_dismount_volume() is called from ocfs2_put_super(),
which runs inside generic_shutdown_super() while s_umount is still held.
The osb structure is freed at this point, but inodes with elevated
refcounts (e.g., held by inotify) survive evict_inodes() and may still
trigger filesystem operations (like ocfs2_permission) that access osb.
The idea is to move the osb cleanup out of ocfs2_dismount_volume() and
into an ocfs2-specific ->kill_sb() callback, so that the cleanup happens
after generic_shutdown_super() has completed and all concurrent VFS
operations have drained.
Specifically:
1. Remove ocfs2_delete_osb(), kfree(osb), and sb->s_fs_info = NULL from
ocfs2_dismount_volume(). Keep all the subsystem shutdown (journal,
dlm, recovery, quota, etc.) there.
2. Add a new ocfs2_kill_sb() that wraps kill_block_super():
static void ocfs2_kill_sb(struct super_block *sb)
{
struct ocfs2_super *osb = OCFS2_SB(sb);
kill_block_super(sb);
// At this point generic_shutdown_super() has completed,
// SB_DYING is set, and no new VFS operations can enter.
if (osb) {
ocfs2_delete_osb(osb);
kfree(osb);
sb->s_fs_info = NULL;
}
}
3. Update ocfs2_fs_type to use ocfs2_kill_sb instead of kill_block_super.
4. The NULL check in ocfs2_inode_lock_full_nested() can optionally be
kept as a defense-in-depth measure, though it is no longer strictly
necessary if the life-cycle ordering is correct.
This pattern is similar to ext4 — ext4_kill_sb() calls kill_block_super()
first and then handles cleanup after (e.g., journal_bdev_file).
Does this approach make sense?
Best regards,
Jiakai
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount
2026-05-09 4:28 ` Jiakai Xu
@ 2026-05-09 6:20 ` Joseph Qi
2026-05-09 8:46 ` Jiakai Xu
0 siblings, 1 reply; 5+ messages in thread
From: Joseph Qi @ 2026-05-09 6:20 UTC (permalink / raw)
To: Jiakai Xu
Cc: heming.zhao, jlbec, kurt.hackel, linux-kernel, mark, ocfs2-devel
On 5/9/26 12:28 PM, Jiakai Xu wrote:
>> It seems this is not enough, or TOCTOU still exists. Say:
>>
>> Thread A Thread B
>> osb = OCFS2_SB(inode->i_sb)
>> ocfs2_dismount_volume()
>> -> sb->s_fs_info = NULL
>> -> kfree(osb)
>> use freed osb
>>
>
> Hi Joseph,
>
> Thank you very much for the review! You are absolutely right about the
> TOCTOU issue — simply adding a NULL check after OCFS2_SB() cannot
> prevent the race where thread A reads a valid osb pointer before thread
> B frees it.
>
>> BTW, how did you find this issue?
>
> I found this issue through fuzzing. The crash report shows a page fault
> at __pv_queued_spin_lock_slowpath via the call path:
>
> ocfs2_permission -> ocfs2_inode_lock_tracker ->
> ocfs2_inode_lock_full_nested -> ocfs2_is_hard_readonly ->
> spin_lock(&osb->osb_lock)
What is the operation?
We expect all operations cannot access filesystem during filesystem shutdown.
>
> The fault address was in the kernel static data region, indicating that
> the osb structure had been freed and its memory reused.
>
> I have been thinking about a more robust fix and would like to get your
> opinion on the following approach:
>
> Currently, ocfs2_dismount_volume() is called from ocfs2_put_super(),
> which runs inside generic_shutdown_super() while s_umount is still held.
> The osb structure is freed at this point, but inodes with elevated
> refcounts (e.g., held by inotify) survive evict_inodes() and may still
> trigger filesystem operations (like ocfs2_permission) that access osb.
>
> The idea is to move the osb cleanup out of ocfs2_dismount_volume() and
> into an ocfs2-specific ->kill_sb() callback, so that the cleanup happens
> after generic_shutdown_super() has completed and all concurrent VFS
> operations have drained.
>
> Specifically:
>
> 1. Remove ocfs2_delete_osb(), kfree(osb), and sb->s_fs_info = NULL from
> ocfs2_dismount_volume(). Keep all the subsystem shutdown (journal,
> dlm, recovery, quota, etc.) there.
>
> 2. Add a new ocfs2_kill_sb() that wraps kill_block_super():
>
> static void ocfs2_kill_sb(struct super_block *sb)
> {
> struct ocfs2_super *osb = OCFS2_SB(sb);
>
> kill_block_super(sb);
> // At this point generic_shutdown_super() has completed,
> // SB_DYING is set, and no new VFS operations can enter.
>
> if (osb) {
> ocfs2_delete_osb(osb);
> kfree(osb);
> sb->s_fs_info = NULL;
> }
> }
>
> 3. Update ocfs2_fs_type to use ocfs2_kill_sb instead of kill_block_super.
>
> 4. The NULL check in ocfs2_inode_lock_full_nested() can optionally be
> kept as a defense-in-depth measure, though it is no longer strictly
> necessary if the life-cycle ordering is correct.
>
> This pattern is similar to ext4 — ext4_kill_sb() calls kill_block_super()
> first and then handles cleanup after (e.g., journal_bdev_file).
>
> Does this approach make sense?
>
In generic_shutdown_super(), it clears SB_ACTIVE.
So it seems we can check this flag.
Thanks,
Joseph
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount
2026-05-09 6:20 ` Joseph Qi
@ 2026-05-09 8:46 ` Jiakai Xu
0 siblings, 0 replies; 5+ messages in thread
From: Jiakai Xu @ 2026-05-09 8:46 UTC (permalink / raw)
To: joseph.qi
Cc: heming.zhao, jlbec, kurt.hackel, linux-kernel, mark, ocfs2-devel,
xujiakai24
> What is the operation?
> We expect all operations cannot access filesystem during filesystem shutdown.
Here is the full crash report produced by the fuzzer:
BUG: unable to handle page fault for address: ffffffff1315afd0
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 6e6a067 P4D 6e6b067 PUD 0
Oops: Oops: 0002 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 12119 Comm: syz.2.132 Not tainted 6.18.5 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:__pv_queued_spin_lock_slowpath+0x109/0x430 home/zzzrrll/tmp/linux/kernel/locking/qspinlock.c:288
Code: 9a 00 00 00 0f b7 c8 81 e1 fc ff 00 00 83 e0 03 48 c1 e0 05 4c 8d a8 00 c4 6b 89 48 c7 c2 f8 ff ff ff 48 8b ac 4a 90 0d ab 86 <48> 89 9c 05 00 c4 6b 89 b8 00 80 00 00 45 31 f6 eb 23 41 80 7c 2d
RSP: 0018:ffa000000da9bcc0 EFLAGS: 00010216
RAX: 0000000000000060 RBX: ff1100007da2c400 RCX: 0000000000008584
RDX: fffffffffffffff8 RSI: 0000000085873528 RDI: 0000000000040000
RBP: ffffffff89a9eb70 R08: ff1100007da2c414 R09: 0000000000000000
R10: 0000000000000002 R11: ffffffff823c6ad0 R12: 0000000000000000
R13: ffffffff896bc460 R14: ff110000f4370000 R15: ff1100007ba096c8
FS: 00007fb3ffc0a640(0000) GS:ff110000f4370000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff1315afd0 CR3: 000000003f598000 CR4: 0000000000751ef0
PKRU: 80000000
Call Trace:
<TASK>
pv_queued_spin_lock_slowpath home/zzzrrll/tmp/linux/include/asm-generic/qspinlock.h:111 [inline]
queued_spin_lock_slowpath home/zzzrrll/tmp/linux/arch/x86/include/asm/qspinlock.h:51 [inline]
queued_spin_lock home/zzzrrll/tmp/linux/include/asm-generic/qspinlock.h:114 [inline]
do_raw_spin_lock home/zzzrrll/tmp/linux/include/linux/spinlock.h:187 [inline]
__raw_spin_lock home/zzzrrll/tmp/linux/include/linux/spinlock_api_smp.h:134 [inline]
_raw_spin_lock+0x31/0x40 home/zzzrrll/tmp/linux/kernel/locking/spinlock.c:154
spin_lock home/zzzrrll/tmp/linux/include/linux/spinlock.h:351 [inline]
ocfs2_is_hard_readonly home/zzzrrll/tmp/linux/fs/ocfs2/ocfs2.h:665 [inline]
ocfs2_inode_lock_full_nested+0x5c/0xca0 home/zzzrrll/tmp/linux/fs/ocfs2/dlmglue.c:2446
ocfs2_inode_lock_tracker+0xd8/0x400 home/zzzrrll/tmp/linux/fs/ocfs2/dlmglue.c:2691
ocfs2_permission+0x75/0x130 home/zzzrrll/tmp/linux/fs/ocfs2/file.c:1349
do_inode_permission home/zzzrrll/tmp/linux/fs/namei.c:526 [inline]
inode_permission+0x1b4/0x2d0 home/zzzrrll/tmp/linux/fs/namei.c:593
path_permission home/zzzrrll/tmp/linux/include/linux/fs.h:3086 [inline]
inotify_find_inode home/zzzrrll/tmp/linux/fs/notify/inotify/inotify_user.c:381 [inline]
__do_sys_inotify_add_watch home/zzzrrll/tmp/linux/fs/notify/inotify/inotify_user.c:771 [inline]
__se_sys_inotify_add_watch+0x146/0x650 home/zzzrrll/tmp/linux/fs/notify/inotify/inotify_user.c:729
do_syscall_x64 home/zzzrrll/tmp/linux/arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xc6/0xfa0 home/zzzrrll/tmp/linux/arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb3fedae16d
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb3ffc09f98 EFLAGS: 00000246 ORIG_RAX: 00000000000000fe
RAX: ffffffffffffffda RBX: 00007fb3feff5fa0 RCX: 00007fb3fedae16d
RDX: 0000000004000000 RSI: 0000200000000080 RDI: 0000000000000004
RBP: 00007fb3fee480f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fb3feff6038 R14: 00007fb3feff5fa0 R15: 00007fb3ffbea000
</TASK>
Modules linked in:
CR2: ffffffff1315afd0
---[ end trace 0000000000000000 ]---
RIP: 0010:__pv_queued_spin_lock_slowpath+0x109/0x430 home/zzzrrll/tmp/linux/kernel/locking/qspinlock.c:288
Code: 9a 00 00 00 0f b7 c8 81 e1 fc ff 00 00 83 e0 03 48 c1 e0 05 4c 8d a8 00 c4 6b 89 48 c7 c2 f8 ff ff ff 48 8b ac 4a 90 0d ab 86 <48> 89 9c 05 00 c4 6b 89 b8 00 80 00 00 45 31 f6 eb 23 41 80 7c 2d
RSP: 0018:ffa000000da9bcc0 EFLAGS: 00010216
RAX: 0000000000000060 RBX: ff1100007da2c400 RCX: 0000000000008584
RDX: fffffffffffffff8 RSI: 0000000085873528 RDI: 0000000000040000
RBP: ffffffff89a9eb70 R08: ff1100007da2c414 R09: 0000000000000000
R10: 0000000000000002 R11: ffffffff823c6ad0 R12: 0000000000000000
R13: ffffffff896bc460 R14: ff110000f4370000 R15: ff1100007ba096c8
FS: 00007fb3ffc0a640(0000) GS:ff110000f4370000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff1315afd0 CR3: 000000003f598000 CR4: 0000000000751ef0
PKRU: 80000000
----------------
Code disassembly (best guess), 1 bytes skipped:
0: 00 00 add %al,(%rax)
2: 00 0f add %cl,(%rdi)
4: b7 c8 mov $0xc8,%bh
6: 81 e1 fc ff 00 00 and $0xfffc,%ecx
c: 83 e0 03 and $0x3,%eax
f: 48 c1 e0 05 shl $0x5,%rax
13: 4c 8d a8 00 c4 6b 89 lea -0x76943c00(%rax),%r13
1a: 48 c7 c2 f8 ff ff ff mov $0xfffffffffffffff8,%rdx
21: 48 8b ac 4a 90 0d ab mov -0x7954f270(%rdx,%rcx,2),%rbp
28: 86
* 29: 48 89 9c 05 00 c4 6b mov %rbx,-0x76943c00(%rbp,%rax,1) <-- trapping instruction
30: 89
31: b8 00 80 00 00 mov $0x8000,%eax
36: 45 31 f6 xor %r14d,%r14d
39: eb 23 jmp 0x5e
3b: 41 rex.B
3c: 80 .byte 0x80
3d: 7c 2d jl 0x6c
Jiakai
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-09 8:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08 6:01 [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount Jiakai Xu
2026-05-08 9:47 ` Joseph Qi
2026-05-09 4:28 ` Jiakai Xu
2026-05-09 6:20 ` Joseph Qi
2026-05-09 8:46 ` Jiakai Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox