[PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount
@ 2026-05-08  6:01 Jiakai Xu
  2026-05-08  9:47 ` Joseph Qi
  0 siblings, 1 reply; 5+ messages in thread
From: Jiakai Xu @ 2026-05-08  6:01 UTC (permalink / raw)
  To: linux-kernel, ocfs2-devel
  Cc: Joel Becker, Joseph Qi, Kurt Hackel, Mark Fasheh, Jiakai Xu

A race condition exists between filesystem unmount and inode permission
operations. When ocfs2_dismount_volume() frees the ocfs2_super (osb)
structure, concurrent access via OCFS2_SB(inode->i_sb) in
ocfs2_inode_lock_full_nested() can dereference freed memory, causing a
page fault in __pv_queued_spin_lock_slowpath via
ocfs2_is_hard_readonly() -> spin_lock(&osb->osb_lock).

Fix this with two changes:

1. In ocfs2_dismount_volume(): set sb->s_fs_info = NULL before
   kfree(osb), so OCFS2_SB() returns NULL instead of a dangling pointer
   during the teardown race window.

2. In ocfs2_inode_lock_full_nested(): add a NULL check on osb after
   OCFS2_SB(), returning -EIO if the superblock info is already gone.
   This ensures the crash path is handled gracefully when the
   filesystem is being torn down.

Signed-off-by: Jiakai Xu <xujiakai24@mails.ucas.ac.cn>
Fixes: ccd979bdbce9f ("OCFS2: The Second Oracle Cluster Filesystem")
---
 fs/ocfs2/dlmglue.c | 3 +++
 fs/ocfs2/super.c   | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 7283bb2c5a31..cd619958a0a2 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2435,6 +2435,9 @@ int ocfs2_inode_lock_full_nested(struct inode *inode,
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 	struct buffer_head *local_bh = NULL;

+	if (!osb)
+		return -EIO;
+
 	mlog(0, "inode %llu, take %s META lock\n",
 	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
 	     ex ? "EXMODE" : "PRMODE");
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index b875f01c9756..3fd56638e4f0 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1881,10 +1881,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 	printk(KERN_INFO "ocfs2: Unmounting device (%s) on (node %s)\n",
 	       osb->dev_str, nodestr);

+	sb->s_fs_info = NULL;
 	ocfs2_delete_osb(osb);
 	kfree(osb);
 	sb->s_dev = 0;
-	sb->s_fs_info = NULL;
 }

 static int ocfs2_setup_osb_uuid(struct ocfs2_super *osb, const unsigned char *uuid,
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount
  2026-05-08  6:01 [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount Jiakai Xu
@ 2026-05-08  9:47 ` Joseph Qi
  2026-05-09  4:28   ` Jiakai Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Joseph Qi @ 2026-05-08  9:47 UTC (permalink / raw)
  To: Jiakai Xu, linux-kernel, ocfs2-devel
  Cc: Joel Becker, Kurt Hackel, Mark Fasheh, Heming Zhao



On 5/8/26 2:01 PM, Jiakai Xu wrote:
> A race condition exists between filesystem unmount and inode permission
> operations. When ocfs2_dismount_volume() frees the ocfs2_super (osb)
> structure, concurrent access via OCFS2_SB(inode->i_sb) in
> ocfs2_inode_lock_full_nested() can dereference freed memory, causing a
> page fault in __pv_queued_spin_lock_slowpath via
> ocfs2_is_hard_readonly() -> spin_lock(&osb->osb_lock).
> 
> Fix this with two changes:
> 
> 1. In ocfs2_dismount_volume(): set sb->s_fs_info = NULL before
>    kfree(osb), so OCFS2_SB() returns NULL instead of a dangling pointer
>    during the teardown race window.
> 
> 2. In ocfs2_inode_lock_full_nested(): add a NULL check on osb after
>    OCFS2_SB(), returning -EIO if the superblock info is already gone.
>    This ensures the crash path is handled gracefully when the
>    filesystem is being torn down.
> 

It seems this is not enough, or TOCTOU still exists. Say:

Thread A			Thread B
osb = OCFS2_SB(inode->i_sb)
				ocfs2_dismount_volume()
				-> sb->s_fs_info = NULL
				-> kfree(osb)
use freed osb

BTW, how did you find this issue?

Joseph

> Signed-off-by: Jiakai Xu <xujiakai24@mails.ucas.ac.cn>
> Fixes: ccd979bdbce9f ("OCFS2: The Second Oracle Cluster Filesystem")
> ---
>  fs/ocfs2/dlmglue.c | 3 +++
>  fs/ocfs2/super.c   | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 7283bb2c5a31..cd619958a0a2 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2435,6 +2435,9 @@ int ocfs2_inode_lock_full_nested(struct inode *inode,
>  	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>  	struct buffer_head *local_bh = NULL;
>  
> +	if (!osb)
> +		return -EIO;
> +
>  	mlog(0, "inode %llu, take %s META lock\n",
>  	     (unsigned long long)OCFS2_I(inode)->ip_blkno,
>  	     ex ? "EXMODE" : "PRMODE");
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index b875f01c9756..3fd56638e4f0 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -1881,10 +1881,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
>  	printk(KERN_INFO "ocfs2: Unmounting device (%s) on (node %s)\n",
>  	       osb->dev_str, nodestr);
>  
> +	sb->s_fs_info = NULL;
>  	ocfs2_delete_osb(osb);
>  	kfree(osb);
>  	sb->s_dev = 0;
> -	sb->s_fs_info = NULL;
>  }
>  
>  static int ocfs2_setup_osb_uuid(struct ocfs2_super *osb, const unsigned char *uuid,


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount
  2026-05-08  9:47 ` Joseph Qi
@ 2026-05-09  4:28   ` Jiakai Xu
  2026-05-09  6:20     ` Joseph Qi
  0 siblings, 1 reply; 5+ messages in thread
From: Jiakai Xu @ 2026-05-09  4:28 UTC (permalink / raw)
  To: joseph.qi
  Cc: heming.zhao, jlbec, kurt.hackel, linux-kernel, mark, ocfs2-devel,
	xujiakai24

> It seems this is not enough, or TOCTOU still exists. Say:
> 
> Thread A			Thread B
> osb = OCFS2_SB(inode->i_sb)
> 				ocfs2_dismount_volume()
> 				-> sb->s_fs_info = NULL
> 				-> kfree(osb)
> use freed osb
> 

Hi Joseph,

Thank you very much for the review! You are absolutely right about the
TOCTOU issue — simply adding a NULL check after OCFS2_SB() cannot
prevent the race where thread A reads a valid osb pointer before thread
B frees it.

> BTW, how did you find this issue?

I found this issue through fuzzing. The crash report shows a page fault 
at __pv_queued_spin_lock_slowpath via the call path:

  ocfs2_permission -> ocfs2_inode_lock_tracker ->
  ocfs2_inode_lock_full_nested -> ocfs2_is_hard_readonly ->
  spin_lock(&osb->osb_lock)

The fault address was in the kernel static data region, indicating that
the osb structure had been freed and its memory reused.

I have been thinking about a more robust fix and would like to get your
opinion on the following approach:

Currently, ocfs2_dismount_volume() is called from ocfs2_put_super(),
which runs inside generic_shutdown_super() while s_umount is still held.
The osb structure is freed at this point, but inodes with elevated
refcounts (e.g., held by inotify) survive evict_inodes() and may still
trigger filesystem operations (like ocfs2_permission) that access osb.

The idea is to move the osb cleanup out of ocfs2_dismount_volume() and
into an ocfs2-specific ->kill_sb() callback, so that the cleanup happens
after generic_shutdown_super() has completed and all concurrent VFS
operations have drained.

Specifically:

1. Remove ocfs2_delete_osb(), kfree(osb), and sb->s_fs_info = NULL from
   ocfs2_dismount_volume(). Keep all the subsystem shutdown (journal,
   dlm, recovery, quota, etc.) there.

2. Add a new ocfs2_kill_sb() that wraps kill_block_super():

   static void ocfs2_kill_sb(struct super_block *sb)
   {
       struct ocfs2_super *osb = OCFS2_SB(sb);

       kill_block_super(sb);
       // At this point generic_shutdown_super() has completed,
       // SB_DYING is set, and no new VFS operations can enter.

       if (osb) {
           ocfs2_delete_osb(osb);
           kfree(osb);
           sb->s_fs_info = NULL;
       }
   }

3. Update ocfs2_fs_type to use ocfs2_kill_sb instead of kill_block_super.

4. The NULL check in ocfs2_inode_lock_full_nested() can optionally be
   kept as a defense-in-depth measure, though it is no longer strictly
   necessary if the life-cycle ordering is correct.

This pattern is similar to ext4 — ext4_kill_sb() calls kill_block_super()
first and then handles cleanup after (e.g., journal_bdev_file).

Does this approach make sense?

Best regards,
Jiakai

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount
  2026-05-09  4:28   ` Jiakai Xu
@ 2026-05-09  6:20     ` Joseph Qi
  2026-05-09  8:46       ` Jiakai Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Joseph Qi @ 2026-05-09  6:20 UTC (permalink / raw)
  To: Jiakai Xu
  Cc: heming.zhao, jlbec, kurt.hackel, linux-kernel, mark, ocfs2-devel



On 5/9/26 12:28 PM, Jiakai Xu wrote:
>> It seems this is not enough, or TOCTOU still exists. Say:
>>
>> Thread A			Thread B
>> osb = OCFS2_SB(inode->i_sb)
>> 				ocfs2_dismount_volume()
>> 				-> sb->s_fs_info = NULL
>> 				-> kfree(osb)
>> use freed osb
>>
> 
> Hi Joseph,
> 
> Thank you very much for the review! You are absolutely right about the
> TOCTOU issue — simply adding a NULL check after OCFS2_SB() cannot
> prevent the race where thread A reads a valid osb pointer before thread
> B frees it.
> 
>> BTW, how did you find this issue?
> 
> I found this issue through fuzzing. The crash report shows a page fault 
> at __pv_queued_spin_lock_slowpath via the call path:
> 
>   ocfs2_permission -> ocfs2_inode_lock_tracker ->
>   ocfs2_inode_lock_full_nested -> ocfs2_is_hard_readonly ->
>   spin_lock(&osb->osb_lock)

What is the operation?
We expect all operations cannot access filesystem during filesystem shutdown.

> 
> The fault address was in the kernel static data region, indicating that
> the osb structure had been freed and its memory reused.
> 
> I have been thinking about a more robust fix and would like to get your
> opinion on the following approach:
> 
> Currently, ocfs2_dismount_volume() is called from ocfs2_put_super(),
> which runs inside generic_shutdown_super() while s_umount is still held.
> The osb structure is freed at this point, but inodes with elevated
> refcounts (e.g., held by inotify) survive evict_inodes() and may still
> trigger filesystem operations (like ocfs2_permission) that access osb.
> 
> The idea is to move the osb cleanup out of ocfs2_dismount_volume() and
> into an ocfs2-specific ->kill_sb() callback, so that the cleanup happens
> after generic_shutdown_super() has completed and all concurrent VFS
> operations have drained.
> 
> Specifically:
> 
> 1. Remove ocfs2_delete_osb(), kfree(osb), and sb->s_fs_info = NULL from
>    ocfs2_dismount_volume(). Keep all the subsystem shutdown (journal,
>    dlm, recovery, quota, etc.) there.
> 
> 2. Add a new ocfs2_kill_sb() that wraps kill_block_super():
> 
>    static void ocfs2_kill_sb(struct super_block *sb)
>    {
>        struct ocfs2_super *osb = OCFS2_SB(sb);
> 
>        kill_block_super(sb);
>        // At this point generic_shutdown_super() has completed,
>        // SB_DYING is set, and no new VFS operations can enter.
> 
>        if (osb) {
>            ocfs2_delete_osb(osb);
>            kfree(osb);
>            sb->s_fs_info = NULL;
>        }
>    }
> 
> 3. Update ocfs2_fs_type to use ocfs2_kill_sb instead of kill_block_super.
> 
> 4. The NULL check in ocfs2_inode_lock_full_nested() can optionally be
>    kept as a defense-in-depth measure, though it is no longer strictly
>    necessary if the life-cycle ordering is correct.
> 
> This pattern is similar to ext4 — ext4_kill_sb() calls kill_block_super()
> first and then handles cleanup after (e.g., journal_bdev_file).
> 
> Does this approach make sense?
> 

In generic_shutdown_super(), it clears SB_ACTIVE.
So it seems we can check this flag.

Thanks,
Joseph

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount
  2026-05-09  6:20     ` Joseph Qi
@ 2026-05-09  8:46       ` Jiakai Xu
  0 siblings, 0 replies; 5+ messages in thread
From: Jiakai Xu @ 2026-05-09  8:46 UTC (permalink / raw)
  To: joseph.qi
  Cc: heming.zhao, jlbec, kurt.hackel, linux-kernel, mark, ocfs2-devel,
	xujiakai24

> What is the operation?
> We expect all operations cannot access filesystem during filesystem shutdown.

Here is the full crash report produced by the fuzzer:

BUG: unable to handle page fault for address: ffffffff1315afd0
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 6e6a067 P4D 6e6b067 PUD 0 
Oops: Oops: 0002 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 12119 Comm: syz.2.132 Not tainted 6.18.5 #1 PREEMPT(full) 
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:__pv_queued_spin_lock_slowpath+0x109/0x430 home/zzzrrll/tmp/linux/kernel/locking/qspinlock.c:288
Code: 9a 00 00 00 0f b7 c8 81 e1 fc ff 00 00 83 e0 03 48 c1 e0 05 4c 8d a8 00 c4 6b 89 48 c7 c2 f8 ff ff ff 48 8b ac 4a 90 0d ab 86 <48> 89 9c 05 00 c4 6b 89 b8 00 80 00 00 45 31 f6 eb 23 41 80 7c 2d
RSP: 0018:ffa000000da9bcc0 EFLAGS: 00010216
RAX: 0000000000000060 RBX: ff1100007da2c400 RCX: 0000000000008584
RDX: fffffffffffffff8 RSI: 0000000085873528 RDI: 0000000000040000
RBP: ffffffff89a9eb70 R08: ff1100007da2c414 R09: 0000000000000000
R10: 0000000000000002 R11: ffffffff823c6ad0 R12: 0000000000000000
R13: ffffffff896bc460 R14: ff110000f4370000 R15: ff1100007ba096c8
FS:  00007fb3ffc0a640(0000) GS:ff110000f4370000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff1315afd0 CR3: 000000003f598000 CR4: 0000000000751ef0
PKRU: 80000000
Call Trace:
 <TASK>
 pv_queued_spin_lock_slowpath home/zzzrrll/tmp/linux/include/asm-generic/qspinlock.h:111 [inline]
 queued_spin_lock_slowpath home/zzzrrll/tmp/linux/arch/x86/include/asm/qspinlock.h:51 [inline]
 queued_spin_lock home/zzzrrll/tmp/linux/include/asm-generic/qspinlock.h:114 [inline]
 do_raw_spin_lock home/zzzrrll/tmp/linux/include/linux/spinlock.h:187 [inline]
 __raw_spin_lock home/zzzrrll/tmp/linux/include/linux/spinlock_api_smp.h:134 [inline]
 _raw_spin_lock+0x31/0x40 home/zzzrrll/tmp/linux/kernel/locking/spinlock.c:154
 spin_lock home/zzzrrll/tmp/linux/include/linux/spinlock.h:351 [inline]
 ocfs2_is_hard_readonly home/zzzrrll/tmp/linux/fs/ocfs2/ocfs2.h:665 [inline]
 ocfs2_inode_lock_full_nested+0x5c/0xca0 home/zzzrrll/tmp/linux/fs/ocfs2/dlmglue.c:2446
 ocfs2_inode_lock_tracker+0xd8/0x400 home/zzzrrll/tmp/linux/fs/ocfs2/dlmglue.c:2691
 ocfs2_permission+0x75/0x130 home/zzzrrll/tmp/linux/fs/ocfs2/file.c:1349
 do_inode_permission home/zzzrrll/tmp/linux/fs/namei.c:526 [inline]
 inode_permission+0x1b4/0x2d0 home/zzzrrll/tmp/linux/fs/namei.c:593
 path_permission home/zzzrrll/tmp/linux/include/linux/fs.h:3086 [inline]
 inotify_find_inode home/zzzrrll/tmp/linux/fs/notify/inotify/inotify_user.c:381 [inline]
 __do_sys_inotify_add_watch home/zzzrrll/tmp/linux/fs/notify/inotify/inotify_user.c:771 [inline]
 __se_sys_inotify_add_watch+0x146/0x650 home/zzzrrll/tmp/linux/fs/notify/inotify/inotify_user.c:729
 do_syscall_x64 home/zzzrrll/tmp/linux/arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xc6/0xfa0 home/zzzrrll/tmp/linux/arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb3fedae16d
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb3ffc09f98 EFLAGS: 00000246 ORIG_RAX: 00000000000000fe
RAX: ffffffffffffffda RBX: 00007fb3feff5fa0 RCX: 00007fb3fedae16d
RDX: 0000000004000000 RSI: 0000200000000080 RDI: 0000000000000004
RBP: 00007fb3fee480f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fb3feff6038 R14: 00007fb3feff5fa0 R15: 00007fb3ffbea000
 </TASK>
Modules linked in:
CR2: ffffffff1315afd0
---[ end trace 0000000000000000 ]---
RIP: 0010:__pv_queued_spin_lock_slowpath+0x109/0x430 home/zzzrrll/tmp/linux/kernel/locking/qspinlock.c:288
Code: 9a 00 00 00 0f b7 c8 81 e1 fc ff 00 00 83 e0 03 48 c1 e0 05 4c 8d a8 00 c4 6b 89 48 c7 c2 f8 ff ff ff 48 8b ac 4a 90 0d ab 86 <48> 89 9c 05 00 c4 6b 89 b8 00 80 00 00 45 31 f6 eb 23 41 80 7c 2d
RSP: 0018:ffa000000da9bcc0 EFLAGS: 00010216
RAX: 0000000000000060 RBX: ff1100007da2c400 RCX: 0000000000008584
RDX: fffffffffffffff8 RSI: 0000000085873528 RDI: 0000000000040000
RBP: ffffffff89a9eb70 R08: ff1100007da2c414 R09: 0000000000000000
R10: 0000000000000002 R11: ffffffff823c6ad0 R12: 0000000000000000
R13: ffffffff896bc460 R14: ff110000f4370000 R15: ff1100007ba096c8
FS:  00007fb3ffc0a640(0000) GS:ff110000f4370000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff1315afd0 CR3: 000000003f598000 CR4: 0000000000751ef0
PKRU: 80000000
----------------
Code disassembly (best guess), 1 bytes skipped:
   0:	00 00                	add    %al,(%rax)
   2:	00 0f                	add    %cl,(%rdi)
   4:	b7 c8                	mov    $0xc8,%bh
   6:	81 e1 fc ff 00 00    	and    $0xfffc,%ecx
   c:	83 e0 03             	and    $0x3,%eax
   f:	48 c1 e0 05          	shl    $0x5,%rax
  13:	4c 8d a8 00 c4 6b 89 	lea    -0x76943c00(%rax),%r13
  1a:	48 c7 c2 f8 ff ff ff 	mov    $0xfffffffffffffff8,%rdx
  21:	48 8b ac 4a 90 0d ab 	mov    -0x7954f270(%rdx,%rcx,2),%rbp
  28:	86
* 29:	48 89 9c 05 00 c4 6b 	mov    %rbx,-0x76943c00(%rbp,%rax,1) <-- trapping instruction
  30:	89
  31:	b8 00 80 00 00       	mov    $0x8000,%eax
  36:	45 31 f6             	xor    %r14d,%r14d
  39:	eb 23                	jmp    0x5e
  3b:	41                   	rex.B
  3c:	80                   	.byte 0x80
  3d:	7c 2d                	jl     0x6c

Jiakai


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-09  8:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08  6:01 [PATCH] ocfs2: fix use-after-free in ocfs2_inode_lock_full_nested during unmount Jiakai Xu
2026-05-08  9:47 ` Joseph Qi
2026-05-09  4:28   ` Jiakai Xu
2026-05-09  6:20     ` Joseph Qi
2026-05-09  8:46       ` Jiakai Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox