[syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache

Linux filesystem development
 help / color / mirror / Atom feed

* [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree
@ 2026-06-17 17:08 syzbot
  2026-06-18 18:44 ` rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree] Jann Horn
  0 siblings, 1 reply; 13+ messages in thread
From: syzbot @ 2026-06-17 17:08 UTC (permalink / raw)
  To: brauner, jack, linux-fsdevel, linux-kernel, syzkaller-bugs, viro

Hello,

syzbot found the following issue on:

HEAD commit:    c425609d6ac4 Add linux-next specific files for 20260612
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=12864986580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d7a56b1e89b63439
dashboard link: https://syzkaller.appspot.com/bug?extid=000c800a02097aaa10ed
compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/7fab9a8df61a/disk-c425609d.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c2577196651b/vmlinux-c425609d.xz
kernel image: https://storage.googleapis.com/syzbot-assets/053557a7471e/bzImage-c425609d.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-use-after-free in __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:132 [inline]
BUG: KASAN: slab-use-after-free in _raw_spin_lock_irqsave+0x40/0x60 kernel/locking/spinlock.c:166
Read of size 1 at addr ffff8880400e5570 by task syz-executor/5618

CPU: 0 UID: 0 PID: 5618 Comm: syz-executor Not tainted syzkaller #0 PREEMPT_{RT,(full)} 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 __kasan_check_byte+0x2a/0x40 mm/kasan/common.c:574
 kasan_check_byte include/linux/kasan.h:402 [inline]
 lock_acquire+0x84/0x350 kernel/locking/lockdep.c:5844
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:132 [inline]
 _raw_spin_lock_irqsave+0x40/0x60 kernel/locking/spinlock.c:166
 rt_mutex_slowunlock+0xbf/0xa20 kernel/locking/rtmutex.c:1430
 spin_unlock include/linux/spinlock_rt.h:109 [inline]
 shrink_dcache_tree+0x30e/0x410 fs/dcache.c:1754
 vfs_rmdir+0x425/0x6b0 fs/namei.c:5381
 filename_rmdir+0x292/0x520 fs/namei.c:5434
 __do_sys_unlinkat fs/namei.c:5609 [inline]
 __se_sys_unlinkat+0x71/0x1a0 fs/namei.c:5602
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe005bdbf77
Code: 77 01 c3 48 c7 c2 e8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffef7890fe8 EFLAGS: 00000207 ORIG_RAX: 0000000000000107
RAX: ffffffffffffffda RBX: 0000000000000065 RCX: 00007fe005bdbf77
RDX: 0000000000000200 RSI: 00007ffef7892130 RDI: 00000000ffffff9c
RBP: 00007fe005c721ca R08: 00000000000065c0 R09: 00000000ffffffff
R10: 0000000000000100 R11: 0000000000000207 R12: 00007ffef7892130
R13: 00007fe005c721ca R14: 0000000000022281 R15: 00007ffef7892170
 </TASK>

Allocated by task 6103:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 unpoison_slab_object mm/kasan/common.c:340 [inline]
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4610 [inline]
 slab_alloc_node mm/slub.c:4943 [inline]
 kmem_cache_alloc_lru_noprof+0x347/0x6a0 mm/slub.c:4976
 __d_alloc+0x37/0x6f0 fs/dcache.c:1902
 d_alloc_parallel+0xde/0x16c0 fs/dcache.c:2761
 lookup_open fs/namei.c:4423 [inline]
 open_last_lookups fs/namei.c:4608 [inline]
 path_openat+0xbf0/0x3850 fs/namei.c:4856
 do_file_open+0x23e/0x4a0 fs/namei.c:4888
 do_sys_openat2+0x115/0x200 fs/open.c:1368
 do_sys_open fs/open.c:1374 [inline]
 __do_sys_openat fs/open.c:1390 [inline]
 __se_sys_openat fs/open.c:1385 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1385
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 29:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2703 [inline]
 slab_free mm/slub.c:6402 [inline]
 kmem_cache_free+0x187/0x6c0 mm/slub.c:6529
 rcu_do_batch kernel/rcu/tree.c:2645 [inline]
 rcu_core kernel/rcu/tree.c:2897 [inline]
 rcu_cpu_kthread+0x950/0x1480 kernel/rcu/tree.c:2985
 smpboot_thread_fn+0x57c/0xa80 kernel/smpboot.c:160
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Last potentially related work creation:
 kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
 kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
 __call_rcu_common kernel/rcu/tree.c:3159 [inline]
 call_rcu+0xee/0x8b0 kernel/rcu/tree.c:3279
 dentry_kill+0x4d3/0x880 fs/dcache.c:845
 finish_dput+0x1a/0x260 fs/dcache.c:1001
 __fput+0x699/0xa80 fs/file_table.c:520
 task_work_run+0x1d9/0x270 kernel/task_work.c:233
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 __exit_to_user_mode_loop kernel/entry/common.c:70 [inline]
 exit_to_user_mode_loop+0x1fa/0x730 kernel/entry/common.c:101
 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline]
 syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline]
 do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff8880400e54a0
 which belongs to the cache dentry of size 376
The buggy address is located 208 bytes inside of
 freed 376-byte region [ffff8880400e54a0, ffff8880400e5618)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x400e4
head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff888031f5da01
flags: 0x80000000000040(head|node=0|zone=1)
page_type: f5(slab)
raw: 0080000000000040 ffff88801be88500 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800120012 00000000f5000000 ffff888031f5da01
head: 0080000000000040 ffff88801be88500 dead000000000100 dead000000000122
head: 0000000000000000 0000000800120012 00000000f5000000 ffff888031f5da01
head: 0080000000000001 ffffffffffffff81 00000000ffffffff 00000000ffffffff
head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000002
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 1, migratetype Reclaimable, gfp_mask 0xd20d0(__GFP_RECLAIMABLE|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4988, tgid 4988 (udevd), ts 46901016481, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x1f9/0x250 mm/page_alloc.c:1859
 prep_new_page mm/page_alloc.c:1867 [inline]
 get_page_from_freelist+0x2639/0x26b0 mm/page_alloc.c:3946
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5304
 alloc_slab_page mm/slub.c:3292 [inline]
 allocate_slab+0x79/0x5e0 mm/slub.c:3406
 new_slab mm/slub.c:3452 [inline]
 refill_objects+0x2d8/0x350 mm/slub.c:7335
 refill_sheaf mm/slub.c:2830 [inline]
 __pcs_replace_empty_main+0x330/0x690 mm/slub.c:4701
 alloc_from_pcs mm/slub.c:4799 [inline]
 slab_alloc_node mm/slub.c:4931 [inline]
 kmem_cache_alloc_lru_noprof+0x45e/0x6a0 mm/slub.c:4976
 __d_alloc+0x37/0x6f0 fs/dcache.c:1902
 d_alloc+0x4b/0x190 fs/dcache.c:1981
 lookup_one_qstr_excl+0xd8/0x360 fs/namei.c:1806
 __start_dirop fs/namei.c:2920 [inline]
 start_dirop fs/namei.c:2942 [inline]
 filename_create+0x20e/0x370 fs/namei.c:4951
 filename_symlinkat+0xf7/0x420 fs/namei.c:5675
 __do_sys_symlink fs/namei.c:5708 [inline]
 __se_sys_symlink+0x4d/0x2b0 fs/namei.c:5704
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
page_owner free stack trace missing

Memory state around the buggy address:
 ffff8880400e5400: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
 ffff8880400e5480: fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb fb
>ffff8880400e5500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
 ffff8880400e5580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880400e5600: fb fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 13+ messages in thread

* rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree]
  2026-06-17 17:08 [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree syzbot
@ 2026-06-18 18:44 ` Jann Horn
  2026-06-18 20:59   ` Al Viro
  0 siblings, 1 reply; 13+ messages in thread
From: Jann Horn @ 2026-06-18 18:44 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, Waiman Long, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt
  Cc: syzbot, Christian Brauner, Jan Kara, linux-fsdevel, kernel list,
	syzkaller-bugs, Al Viro

I think this is more of a bug in RT spinlocks than a VFS bug, though
it's a bit murky.

rt_spin_unlock() looks like this:

void __sched rt_spin_unlock(spinlock_t *lock) __releases(RCU)
{
        spin_release(&lock->dep_map, _RET_IP_);
        migrate_enable();
        rcu_read_unlock();

        if (unlikely(!rt_mutex_cmpxchg_release(&lock->lock, current, NULL)))
                rt_mutex_slowunlock(&lock->lock);
}

Note how the RCU read-side critical section and the protection against
migration end *before* the lock is actually released, which means this
can UAF if the RCU read-side critical section implied by the spinlock
is the only thing keeping the lock alive. While non-RT spinlocks do
this the other way around (do_raw_spin_unlock() before
preempt_enable()):

static inline void __raw_spin_unlock(raw_spinlock_t *lock)
        __releases(lock)
{
        spin_release(&lock->dep_map, _RET_IP_);
        do_raw_spin_unlock(lock);
        preempt_enable();
}

https://docs.kernel.org/next/RCU/whatisRCU.html guarantees that
spinlock APIs imply RCU, and
https://docs.kernel.org/locking/mutex-design.html says: "This is in
contrast with spin_unlock() [...], which APIs can be used to guarantee
that the memory is not touched by the lock implementation after
spin_unlock()/completion_done() releases the lock.".
Neither of these explicitly guarantees that the RCU read-side critical
section (and the protection against migration?) should still hold
while the lock is being dropped, but I think that would fit best with
the explicit guarantees?

On Wed, Jun 17, 2026 at 7:08 PM syzbot
<syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com> wrote:
> syzbot found the following issue on:
>
> HEAD commit:    c425609d6ac4 Add linux-next specific files for 20260612
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=12864986580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d7a56b1e89b63439
> dashboard link: https://syzkaller.appspot.com/bug?extid=000c800a02097aaa10ed
> compiler:       Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/7fab9a8df61a/disk-c425609d.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/c2577196651b/vmlinux-c425609d.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/053557a7471e/bzImage-c425609d.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com
>
> ==================================================================
> BUG: KASAN: slab-use-after-free in __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:132 [inline]
> BUG: KASAN: slab-use-after-free in _raw_spin_lock_irqsave+0x40/0x60 kernel/locking/spinlock.c:166
> Read of size 1 at addr ffff8880400e5570 by task syz-executor/5618
>
> CPU: 0 UID: 0 PID: 5618 Comm: syz-executor Not tainted syzkaller #0 PREEMPT_{RT,(full)}
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
>  print_address_description+0x55/0x1e0 mm/kasan/report.c:378
>  print_report+0x58/0x70 mm/kasan/report.c:482
>  kasan_report+0x117/0x150 mm/kasan/report.c:595
>  __kasan_check_byte+0x2a/0x40 mm/kasan/common.c:574
>  kasan_check_byte include/linux/kasan.h:402 [inline]
>  lock_acquire+0x84/0x350 kernel/locking/lockdep.c:5844
>  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:132 [inline]
>  _raw_spin_lock_irqsave+0x40/0x60 kernel/locking/spinlock.c:166
>  rt_mutex_slowunlock+0xbf/0xa20 kernel/locking/rtmutex.c:1430
>  spin_unlock include/linux/spinlock_rt.h:109 [inline]
>  shrink_dcache_tree+0x30e/0x410 fs/dcache.c:1754
>  vfs_rmdir+0x425/0x6b0 fs/namei.c:5381
>  filename_rmdir+0x292/0x520 fs/namei.c:5434
>  __do_sys_unlinkat fs/namei.c:5609 [inline]
>  __se_sys_unlinkat+0x71/0x1a0 fs/namei.c:5602
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fe005bdbf77
> Code: 77 01 c3 48 c7 c2 e8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffef7890fe8 EFLAGS: 00000207 ORIG_RAX: 0000000000000107
> RAX: ffffffffffffffda RBX: 0000000000000065 RCX: 00007fe005bdbf77
> RDX: 0000000000000200 RSI: 00007ffef7892130 RDI: 00000000ffffff9c
> RBP: 00007fe005c721ca R08: 00000000000065c0 R09: 00000000ffffffff
> R10: 0000000000000100 R11: 0000000000000207 R12: 00007ffef7892130
> R13: 00007fe005c721ca R14: 0000000000022281 R15: 00007ffef7892170
>  </TASK>
>
> Allocated by task 6103:
>  kasan_save_stack mm/kasan/common.c:57 [inline]
>  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
>  unpoison_slab_object mm/kasan/common.c:340 [inline]
>  __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
>  kasan_slab_alloc include/linux/kasan.h:253 [inline]
>  slab_post_alloc_hook mm/slub.c:4610 [inline]
>  slab_alloc_node mm/slub.c:4943 [inline]
>  kmem_cache_alloc_lru_noprof+0x347/0x6a0 mm/slub.c:4976
>  __d_alloc+0x37/0x6f0 fs/dcache.c:1902
>  d_alloc_parallel+0xde/0x16c0 fs/dcache.c:2761
>  lookup_open fs/namei.c:4423 [inline]
>  open_last_lookups fs/namei.c:4608 [inline]
>  path_openat+0xbf0/0x3850 fs/namei.c:4856
>  do_file_open+0x23e/0x4a0 fs/namei.c:4888
>  do_sys_openat2+0x115/0x200 fs/open.c:1368
>  do_sys_open fs/open.c:1374 [inline]
>  __do_sys_openat fs/open.c:1390 [inline]
>  __se_sys_openat fs/open.c:1385 [inline]
>  __x64_sys_openat+0x138/0x170 fs/open.c:1385
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Freed by task 29:
>  kasan_save_stack mm/kasan/common.c:57 [inline]
>  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
>  kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
>  poison_slab_object mm/kasan/common.c:253 [inline]
>  __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
>  kasan_slab_free include/linux/kasan.h:235 [inline]
>  slab_free_hook mm/slub.c:2703 [inline]
>  slab_free mm/slub.c:6402 [inline]
>  kmem_cache_free+0x187/0x6c0 mm/slub.c:6529
>  rcu_do_batch kernel/rcu/tree.c:2645 [inline]
>  rcu_core kernel/rcu/tree.c:2897 [inline]
>  rcu_cpu_kthread+0x950/0x1480 kernel/rcu/tree.c:2985
>  smpboot_thread_fn+0x57c/0xa80 kernel/smpboot.c:160
>  kthread+0x388/0x470 kernel/kthread.c:436
>  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>
> Last potentially related work creation:
>  kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
>  kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
>  __call_rcu_common kernel/rcu/tree.c:3159 [inline]
>  call_rcu+0xee/0x8b0 kernel/rcu/tree.c:3279
>  dentry_kill+0x4d3/0x880 fs/dcache.c:845
>  finish_dput+0x1a/0x260 fs/dcache.c:1001
>  __fput+0x699/0xa80 fs/file_table.c:520
>  task_work_run+0x1d9/0x270 kernel/task_work.c:233
>  resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
>  __exit_to_user_mode_loop kernel/entry/common.c:70 [inline]
>  exit_to_user_mode_loop+0x1fa/0x730 kernel/entry/common.c:101
>  __exit_to_user_mode_prepare include/linux/irq-entry-common.h:207 [inline]
>  syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:230 [inline]
>  syscall_exit_to_user_mode include/linux/entry-common.h:318 [inline]
>  do_syscall_64+0x353/0x580 arch/x86/entry/syscall_64.c:100
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> The buggy address belongs to the object at ffff8880400e54a0
>  which belongs to the cache dentry of size 376
> The buggy address is located 208 bytes inside of
>  freed 376-byte region [ffff8880400e54a0, ffff8880400e5618)
>
> The buggy address belongs to the physical page:
> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x400e4
> head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> memcg:ffff888031f5da01
> flags: 0x80000000000040(head|node=0|zone=1)
> page_type: f5(slab)
> raw: 0080000000000040 ffff88801be88500 dead000000000100 dead000000000122
> raw: 0000000000000000 0000000800120012 00000000f5000000 ffff888031f5da01
> head: 0080000000000040 ffff88801be88500 dead000000000100 dead000000000122
> head: 0000000000000000 0000000800120012 00000000f5000000 ffff888031f5da01
> head: 0080000000000001 ffffffffffffff81 00000000ffffffff 00000000ffffffff
> head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000002
> page dumped because: kasan: bad access detected
> page_owner tracks the page as allocated
> page last allocated via order 1, migratetype Reclaimable, gfp_mask 0xd20d0(__GFP_RECLAIMABLE|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4988, tgid 4988 (udevd), ts 46901016481, free_ts 0
>  set_page_owner include/linux/page_owner.h:32 [inline]
>  post_alloc_hook+0x1f9/0x250 mm/page_alloc.c:1859
>  prep_new_page mm/page_alloc.c:1867 [inline]
>  get_page_from_freelist+0x2639/0x26b0 mm/page_alloc.c:3946
>  __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5304
>  alloc_slab_page mm/slub.c:3292 [inline]
>  allocate_slab+0x79/0x5e0 mm/slub.c:3406
>  new_slab mm/slub.c:3452 [inline]
>  refill_objects+0x2d8/0x350 mm/slub.c:7335
>  refill_sheaf mm/slub.c:2830 [inline]
>  __pcs_replace_empty_main+0x330/0x690 mm/slub.c:4701
>  alloc_from_pcs mm/slub.c:4799 [inline]
>  slab_alloc_node mm/slub.c:4931 [inline]
>  kmem_cache_alloc_lru_noprof+0x45e/0x6a0 mm/slub.c:4976
>  __d_alloc+0x37/0x6f0 fs/dcache.c:1902
>  d_alloc+0x4b/0x190 fs/dcache.c:1981
>  lookup_one_qstr_excl+0xd8/0x360 fs/namei.c:1806
>  __start_dirop fs/namei.c:2920 [inline]
>  start_dirop fs/namei.c:2942 [inline]
>  filename_create+0x20e/0x370 fs/namei.c:4951
>  filename_symlinkat+0xf7/0x420 fs/namei.c:5675
>  __do_sys_symlink fs/namei.c:5708 [inline]
>  __se_sys_symlink+0x4d/0x2b0 fs/namei.c:5704
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> page_owner free stack trace missing
>
> Memory state around the buggy address:
>  ffff8880400e5400: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
>  ffff8880400e5480: fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb fb
> >ffff8880400e5500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                              ^
>  ffff8880400e5580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8880400e5600: fb fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00
> ==================================================================
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree]
  2026-06-18 18:44 ` rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree] Jann Horn
@ 2026-06-18 20:59   ` Al Viro
  2026-06-18 21:03     ` Al Viro
  2026-06-21  0:46     ` rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree] Jeff Layton
  0 siblings, 2 replies; 13+ messages in thread
From: Al Viro @ 2026-06-18 20:59 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, Waiman Long, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, syzbot, Christian Brauner,
	Jan Kara, linux-fsdevel, kernel list, syzkaller-bugs, Jeff Layton

On Thu, Jun 18, 2026 at 08:44:32PM +0200, Jann Horn wrote:
> I think this is more of a bug in RT spinlocks than a VFS bug, though
> it's a bit murky.
> 
> rt_spin_unlock() looks like this:
> 
> void __sched rt_spin_unlock(spinlock_t *lock) __releases(RCU)
> {
>         spin_release(&lock->dep_map, _RET_IP_);
>         migrate_enable();
>         rcu_read_unlock();
> 
>         if (unlikely(!rt_mutex_cmpxchg_release(&lock->lock, current, NULL)))
>                 rt_mutex_slowunlock(&lock->lock);
> }
> 
> Note how the RCU read-side critical section and the protection against
> migration end *before* the lock is actually released, which means this
> can UAF if the RCU read-side critical section implied by the spinlock
> is the only thing keeping the lock alive. While non-RT spinlocks do
> this the other way around (do_raw_spin_unlock() before
> preempt_enable()):
> 
> static inline void __raw_spin_unlock(raw_spinlock_t *lock)
>         __releases(lock)
> {
>         spin_release(&lock->dep_map, _RET_IP_);
>         do_raw_spin_unlock(lock);
>         preempt_enable();
> }
> 
> https://docs.kernel.org/next/RCU/whatisRCU.html guarantees that
> spinlock APIs imply RCU, and
> https://docs.kernel.org/locking/mutex-design.html says: "This is in
> contrast with spin_unlock() [...], which APIs can be used to guarantee
> that the memory is not touched by the lock implementation after
> spin_unlock()/completion_done() releases the lock.".
> Neither of these explicitly guarantees that the RCU read-side critical
> section (and the protection against migration?) should still hold
> while the lock is being dropped, but I think that would fit best with
> the explicit guarantees?

I'm trying to recall if PREEMPT_RT had been enabled in the last round of
UAF in that area back in early April...

As far as I'm concerned, we *do* need to keep RCU read-side critical area
all the way until the end of spin_unlock(); it very well might be the
only thing to prevent freeing the sucker under us.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree]
  2026-06-18 20:59   ` Al Viro
@ 2026-06-18 21:03     ` Al Viro
  2026-06-18 22:24       ` Thomas Gleixner
  2026-06-21  0:46     ` rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree] Jeff Layton
  1 sibling, 1 reply; 13+ messages in thread
From: Al Viro @ 2026-06-18 21:03 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, Waiman Long, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, syzbot, Christian Brauner,
	Jan Kara, linux-fsdevel, kernel list, syzkaller-bugs, Jeff Layton

On Thu, Jun 18, 2026 at 09:59:53PM +0100, Al Viro wrote:

> > https://docs.kernel.org/next/RCU/whatisRCU.html guarantees that
> > spinlock APIs imply RCU, and
> > https://docs.kernel.org/locking/mutex-design.html says: "This is in
> > contrast with spin_unlock() [...], which APIs can be used to guarantee
> > that the memory is not touched by the lock implementation after
> > spin_unlock()/completion_done() releases the lock.".
> > Neither of these explicitly guarantees that the RCU read-side critical
> > section (and the protection against migration?) should still hold
> > while the lock is being dropped, but I think that would fit best with
> > the explicit guarantees?
> 
> I'm trying to recall if PREEMPT_RT had been enabled in the last round of
> UAF in that area back in early April...
> 
> As far as I'm concerned, we *do* need to keep RCU read-side critical area
> all the way until the end of spin_unlock(); it very well might be the
> only thing to prevent freeing the sucker under us.

FWIW, https://lore.kernel.org/all/6a3094e7.428ffe26.258b27.0171.GAE@google.com/
looks potentially related...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree]
  2026-06-18 21:03     ` Al Viro
@ 2026-06-18 22:24       ` Thomas Gleixner
  2026-06-19  1:36         ` Al Viro
  2026-06-19  8:39         ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2026-06-18 22:24 UTC (permalink / raw)
  To: Al Viro, Jann Horn
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt, syzbot,
	Christian Brauner, Jan Kara, linux-fsdevel, kernel list,
	syzkaller-bugs, Jeff Layton

On Thu, Jun 18 2026 at 22:03, Al Viro wrote:

> On Thu, Jun 18, 2026 at 09:59:53PM +0100, Al Viro wrote:
>> > https://docs.kernel.org/next/RCU/whatisRCU.html guarantees that
>> > spinlock APIs imply RCU, and
>> > https://docs.kernel.org/locking/mutex-design.html says: "This is in
>> > contrast with spin_unlock() [...], which APIs can be used to guarantee
>> > that the memory is not touched by the lock implementation after
>> > spin_unlock()/completion_done() releases the lock.".
>> > Neither of these explicitly guarantees that the RCU read-side critical
>> > section (and the protection against migration?) should still hold
>> > while the lock is being dropped, but I think that would fit best with
>> > the explicit guarantees?
>> 
>> I'm trying to recall if PREEMPT_RT had been enabled in the last round of
>> UAF in that area back in early April...
>> 
>> As far as I'm concerned, we *do* need to keep RCU read-side critical area
>> all the way until the end of spin_unlock(); it very well might be the
>> only thing to prevent freeing the sucker under us.

Right. That's clearly a bug in rt_spin_unlock(). I think I wrote it that
way for symmetry vs. lock(), which is obviously wrong.

Fix below.

Thanks,

        tglx
---
Subject: locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
From: Thomas Gleixner <tglx@kernel.org>
Date: Thu, 18 Jun 2026 23:32:43 +0200

rt_spin_unlock() releases the RCU protection before unlocking the
lock. That opens the door for the following UAF scenario:

 T1					T2
 spin_lock(&p->lock);		rcu_read_lock();
 invalidate(p);			p = rcu_dereference(ptr);
 rcu_assign_pointer(ptr, NULL);	if (!p) return; // Not taken
 spin_unlock(&p->lock);		spin_lock(&p->lock)
 				   lock(&lock->lock);
				   rcu_read_lock();
 kfree_rcu(p);			rcu_read_unlock();
				....
				spin_unlock(&p->lock)
				  rcu_read_unlock(); // Ends grace period
 rcu_do_batch()
   kfree(p);
			 UAF ->	  rt_mutex_cmpxchg_release(&lock->lock...)

Regular spinlocks keep preemption disabled accross the unlock operation,
which provides full RCU protection, but the RT substitution fails to
resemble that.

Move the rcu_read_unlock() invocation past the unlock operation to match
the non-RT semantics and add a comment explaining why rcu_read_unlock()
must come last.

This makes it asymmetric vs. rt_spin_lock(), but that's harmless as the
caller needs to hold RCU read lock across the lock operation. The
migrate_enable() call stays before the unlock operation because there is
no per CPU operation in the unlock path which would require migration to
be kept disabled.

Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Reported-by: syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com
Decoded-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
---
 kernel/locking/spinlock_rt.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

--- a/kernel/locking/spinlock_rt.c
+++ b/kernel/locking/spinlock_rt.c
@@ -79,10 +79,27 @@ void __sched rt_spin_unlock(spinlock_t *
 {
 	spin_release(&lock->dep_map, _RET_IP_);
 	migrate_enable();
-	rcu_read_unlock();
 
 	if (unlikely(!rt_mutex_cmpxchg_release(&lock->lock, current, NULL)))
 		rt_mutex_slowunlock(&lock->lock);
+
+	/*
+	 * This must be last to prevent the following UAF:
+	 *
+	 * T1					T2
+	 * spin_lock(&p->lock);			rcu_read_lock();
+	 * invalidate(p);			p = rcu_dereference(ptr);
+	 * rcu_assign_pointer(ptr, NULL);	if (!p) return;
+	 * spin_unlock(&p->lock);		spin_lock(&p->lock);
+	 * kfree_rcu(p);			rcu_read_unlock();
+	 *					....
+	 *					spin_unlock(&p->lock)
+	 *					  rcu_read_unlock(); // Ends grace period
+	 * rcu_do_batch()
+	 *   kfree(p);
+	 *			    UAF ->	  rt_mutex_cmpxchg_release(&p->lock.lock...)
+	 */
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(rt_spin_unlock);
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree]
  2026-06-18 22:24       ` Thomas Gleixner
@ 2026-06-19  1:36         ` Al Viro
  2026-06-19  8:39         ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 13+ messages in thread
From: Al Viro @ 2026-06-19  1:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jann Horn, Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, syzbot, Christian Brauner, Jan Kara,
	linux-fsdevel, kernel list, syzkaller-bugs, Jeff Layton,
	Paul E. McKenney

On Fri, Jun 19, 2026 at 12:24:58AM +0200, Thomas Gleixner wrote:
> Subject: locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
> From: Thomas Gleixner <tglx@kernel.org>
> Date: Thu, 18 Jun 2026 23:32:43 +0200
> 
> rt_spin_unlock() releases the RCU protection before unlocking the
> lock. That opens the door for the following UAF scenario:
> 
>  T1					T2
>  spin_lock(&p->lock);		rcu_read_lock();
>  invalidate(p);			p = rcu_dereference(ptr);
>  rcu_assign_pointer(ptr, NULL);	if (!p) return; // Not taken
>  spin_unlock(&p->lock);		spin_lock(&p->lock)
>  				   lock(&lock->lock);
> 				   rcu_read_lock();
>  kfree_rcu(p);			rcu_read_unlock();
> 				....
> 				spin_unlock(&p->lock)
> 				  rcu_read_unlock(); // Ends grace period
>  rcu_do_batch()
>    kfree(p);
> 			 UAF ->	  rt_mutex_cmpxchg_release(&lock->lock...)
> 
> Regular spinlocks keep preemption disabled accross the unlock operation,
> which provides full RCU protection, but the RT substitution fails to
> resemble that.
> 
> Move the rcu_read_unlock() invocation past the unlock operation to match
> the non-RT semantics and add a comment explaining why rcu_read_unlock()
> must come last.
> 
> This makes it asymmetric vs. rt_spin_lock(), but that's harmless as the
> caller needs to hold RCU read lock across the lock operation. The
> migrate_enable() call stays before the unlock operation because there is
> no per CPU operation in the unlock path which would require migration to
> be kept disabled.
> 
> Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
> Reported-by: syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com
> Decoded-by: Jann Horn <jannh@google.com>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: stable@vger.kernel.org

IIRC, something very similar being mentioned in dentry UAF threads back in April...
<digs around>
https://lore.kernel.org/all/a0f19c52-47d2-41d9-995a-4bbc6bb73c13@paulmck-laptop/

There it covered rt_read_unlock() and rt_write_unlock() as well; AFAICS both are
in the same situation and at that point there's nothing left held to prevent
the RCU callbacks from running.

Said that, this is _not_ the same thing Jeff had been hitting - config in
question didn't have RT_PREEMPT, so there was something separate  ;-/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree]
  2026-06-18 22:24       ` Thomas Gleixner
  2026-06-19  1:36         ` Al Viro
@ 2026-06-19  8:39         ` Sebastian Andrzej Siewior
  2026-06-19 12:46           ` Thomas Gleixner
  1 sibling, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-19  8:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Al Viro, Jann Horn, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, Waiman Long, Clark Williams, Steven Rostedt, syzbot,
	Christian Brauner, Jan Kara, linux-fsdevel, kernel list,
	syzkaller-bugs, Jeff Layton

On 2026-06-19 00:24:58 [+0200], Thomas Gleixner wrote:
> Right. That's clearly a bug in rt_spin_unlock(). I think I wrote it that
> way for symmetry vs. lock(), which is obviously wrong.

and yet we had it since day one like that.

> Fix below.
> 
> Thanks,
> 
>         tglx
> ---
> Subject: locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
> From: Thomas Gleixner <tglx@kernel.org>
> Date: Thu, 18 Jun 2026 23:32:43 +0200
> 
> rt_spin_unlock() releases the RCU protection before unlocking the
> lock. That opens the door for the following UAF scenario:
…

would you mind folding the following? I don't see why the rwlocks should
be treated any different.

diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c
index db1e11b45de67..4fb77daafd758 100644
--- a/kernel/locking/spinlock_rt.c
+++ b/kernel/locking/spinlock_rt.c
@@ -262,17 +262,21 @@ void __sched rt_read_unlock(rwlock_t *rwlock) __releases(RCU)
 {
 	rwlock_release(&rwlock->dep_map, _RET_IP_);
 	migrate_enable();
-	rcu_read_unlock();
 	rwbase_read_unlock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
+
+	/* This must be last to prevent, see rt_spin_unlock() */
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(rt_read_unlock);
 
 void __sched rt_write_unlock(rwlock_t *rwlock) __releases(RCU)
 {
 	rwlock_release(&rwlock->dep_map, _RET_IP_);
-	rcu_read_unlock();
 	migrate_enable();
 	rwbase_write_unlock(&rwlock->rwbase);
+
+	/* This must be last to prevent, see rt_spin_unlock() */
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(rt_write_unlock);
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree]
  2026-06-19  8:39         ` Sebastian Andrzej Siewior
@ 2026-06-19 12:46           ` Thomas Gleixner
  2026-06-19 12:52             ` [PATCH V2] locking/rt: Fix the incorrect RCU protection in rt_spin_unlock() Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2026-06-19 12:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Al Viro, Jann Horn, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, Waiman Long, Clark Williams, Steven Rostedt, syzbot,
	Christian Brauner, Jan Kara, linux-fsdevel, kernel list,
	syzkaller-bugs, Jeff Layton

On Fri, Jun 19 2026 at 10:39, Sebastian Andrzej Siewior wrote:
> On 2026-06-19 00:24:58 [+0200], Thomas Gleixner wrote:
>
> would you mind folding the following? I don't see why the rwlocks should
> be treated any different.

Duh. I wanted to look at it but my brain only works partially in this
heat ...

> diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c
> index db1e11b45de67..4fb77daafd758 100644
> --- a/kernel/locking/spinlock_rt.c
> +++ b/kernel/locking/spinlock_rt.c
> @@ -262,17 +262,21 @@ void __sched rt_read_unlock(rwlock_t *rwlock) __releases(RCU)
>  {
>  	rwlock_release(&rwlock->dep_map, _RET_IP_);
>  	migrate_enable();
> -	rcu_read_unlock();
>  	rwbase_read_unlock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
> +
> +	/* This must be last to prevent, see rt_spin_unlock() */
> +	rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(rt_read_unlock);
>  
>  void __sched rt_write_unlock(rwlock_t *rwlock) __releases(RCU)
>  {
>  	rwlock_release(&rwlock->dep_map, _RET_IP_);
> -	rcu_read_unlock();
>  	migrate_enable();
>  	rwbase_write_unlock(&rwlock->rwbase);
> +
> +	/* This must be last to prevent, see rt_spin_unlock() */
> +	rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(rt_write_unlock);
>  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V2] locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
  2026-06-19 12:46           ` Thomas Gleixner
@ 2026-06-19 12:52             ` Thomas Gleixner
  2026-06-19 12:58               ` Sebastian Andrzej Siewior
  2026-06-20  6:44               ` Al Viro
  0 siblings, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2026-06-19 12:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Al Viro, Jann Horn, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, Waiman Long, Clark Williams, Steven Rostedt, syzbot,
	Christian Brauner, Jan Kara, linux-fsdevel, kernel list,
	syzkaller-bugs, Jeff Layton

rt_spin_unlock() releases the RCU protection before unlocking the
lock. That opens the door for the following UAF scenario:

 T1					T2
 spin_lock(&p->lock);		rcu_read_lock();
 invalidate(p);			p = rcu_dereference(ptr);
 rcu_assign_pointer(ptr, NULL);	if (!p) return;
 spin_unlock(&p->lock);		spin_lock(&p->lock)
 				   lock(&lock->lock);
				   rcu_read_lock();
 kfree_rcu(p);			rcu_read_unlock();
				....
				spin_unlock(&p->lock)
				  rcu_read_unlock(); // Ends grace period
 rcu_do_batch()
   kfree(p);
			    UAF ->	  rt_mutex_cmpxchg_release(&lock->lock...)

Regular spinlocks keep preemption disabled accross the unlock operation,
which provides full RCU protection, but the RT substitution fails to
resemble that. Same applies for the rwlock substitution.

Move the rcu_read_unlock() invocation past the unlock operations to match
the non-RT semantics. This makes it asymmetric vs. rt_xxx_lock(), but
that's harmless as the caller needs to hold RCU read lock across the lock
operation. The migrate_enable() call stays before the unlock operation
because there is no per CPU operation in the unlock path which would
require migration to be kept disabled.

Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Reported-by: syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com
Decoded-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
---
V2: Fold the corresponding rwlock changes - Sebastian
---
 kernel/locking/spinlock_rt.c |   27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

--- a/kernel/locking/spinlock_rt.c
+++ b/kernel/locking/spinlock_rt.c
@@ -79,10 +79,27 @@ void __sched rt_spin_unlock(spinlock_t *
 {
 	spin_release(&lock->dep_map, _RET_IP_);
 	migrate_enable();
-	rcu_read_unlock();
 
 	if (unlikely(!rt_mutex_cmpxchg_release(&lock->lock, current, NULL)))
 		rt_mutex_slowunlock(&lock->lock);
+
+	/*
+	 * This must be last to prevent the following UAF:
+	 *
+	 * T1					T2
+	 * spin_lock(&p->lock);			rcu_read_lock();
+	 * invalidate(p);			p = rcu_dereference(ptr);
+	 * rcu_assign_pointer(ptr, NULL);	if (!p) return;
+	 * spin_unlock(&p->lock);		spin_lock(&p->lock);
+	 * kfree_rcu(p);			rcu_read_unlock();
+	 *					....
+	 *					spin_unlock(&p->lock)
+	 *					  rcu_read_unlock(); // Ends grace period
+	 * rcu_do_batch()
+	 *   kfree(p);
+	 *			    UAF ->	  rt_mutex_cmpxchg_release(&p->lock.lock...)
+	 */
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(rt_spin_unlock);
 
@@ -262,17 +279,21 @@ void __sched rt_read_unlock(rwlock_t *rw
 {
 	rwlock_release(&rwlock->dep_map, _RET_IP_);
 	migrate_enable();
-	rcu_read_unlock();
 	rwbase_read_unlock(&rwlock->rwbase, TASK_RTLOCK_WAIT);
+
+	/* This must be last. See comment in rt_spin_unlock() */
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(rt_read_unlock);
 
 void __sched rt_write_unlock(rwlock_t *rwlock) __releases(RCU)
 {
 	rwlock_release(&rwlock->dep_map, _RET_IP_);
-	rcu_read_unlock();
 	migrate_enable();
 	rwbase_write_unlock(&rwlock->rwbase);
+
+	/* This must be last. See comment in rt_spin_unlock() */
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(rt_write_unlock);
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2] locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
  2026-06-19 12:52             ` [PATCH V2] locking/rt: Fix the incorrect RCU protection in rt_spin_unlock() Thomas Gleixner
@ 2026-06-19 12:58               ` Sebastian Andrzej Siewior
  2026-06-20  6:44               ` Al Viro
  1 sibling, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-19 12:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Al Viro, Jann Horn, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, Waiman Long, Clark Williams, Steven Rostedt, syzbot,
	Christian Brauner, Jan Kara, linux-fsdevel, kernel list,
	syzkaller-bugs, Jeff Layton

On 2026-06-19 14:52:08 [+0200], Thomas Gleixner wrote:
> rt_spin_unlock() releases the RCU protection before unlocking the
> lock. That opens the door for the following UAF scenario:
…
> Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
> Reported-by: syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com
> Decoded-by: Jann Horn <jannh@google.com>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: stable@vger.kernel.org

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2] locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
  2026-06-19 12:52             ` [PATCH V2] locking/rt: Fix the incorrect RCU protection in rt_spin_unlock() Thomas Gleixner
  2026-06-19 12:58               ` Sebastian Andrzej Siewior
@ 2026-06-20  6:44               ` Al Viro
  2026-06-20 21:45                 ` Thomas Gleixner
  1 sibling, 1 reply; 13+ messages in thread
From: Al Viro @ 2026-06-20  6:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, Jann Horn, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Waiman Long, Clark Williams,
	Steven Rostedt, syzbot, Christian Brauner, Jan Kara,
	linux-fsdevel, kernel list, syzkaller-bugs, Jeff Layton

On Fri, Jun 19, 2026 at 02:52:08PM +0200, Thomas Gleixner wrote:
> rt_spin_unlock() releases the RCU protection before unlocking the
> lock. That opens the door for the following UAF scenario:
> 
>  T1					T2
>  spin_lock(&p->lock);		rcu_read_lock();
>  invalidate(p);			p = rcu_dereference(ptr);
>  rcu_assign_pointer(ptr, NULL);	if (!p) return;
>  spin_unlock(&p->lock);		spin_lock(&p->lock)
>  				   lock(&lock->lock);
> 				   rcu_read_lock();
>  kfree_rcu(p);			rcu_read_unlock();
> 				....
> 				spin_unlock(&p->lock)
> 				  rcu_read_unlock(); // Ends grace period
>  rcu_do_batch()
>    kfree(p);
> 			    UAF ->	  rt_mutex_cmpxchg_release(&lock->lock...)
> 
> Regular spinlocks keep preemption disabled accross the unlock operation,
> which provides full RCU protection, but the RT substitution fails to
> resemble that. Same applies for the rwlock substitution.
> 
> Move the rcu_read_unlock() invocation past the unlock operations to match
> the non-RT semantics. This makes it asymmetric vs. rt_xxx_lock(), but
> that's harmless as the caller needs to hold RCU read lock across the lock
> operation. The migrate_enable() call stays before the unlock operation
> because there is no per CPU operation in the unlock path which would
> require migration to be kept disabled.
> 
> Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
> Reported-by: syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com
> Decoded-by: Jann Horn <jannh@google.com>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: stable@vger.kernel.org

Looks sane.

ACKed-by: Al Viro <viro@zeniv.linux.org.uk>

If RT folks see no subtle problems with that, it ought to go into mainline ASAP.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V2] locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
  2026-06-20  6:44               ` Al Viro
@ 2026-06-20 21:45                 ` Thomas Gleixner
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2026-06-20 21:45 UTC (permalink / raw)
  To: Al Viro
  Cc: Sebastian Andrzej Siewior, Jann Horn, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Waiman Long, Clark Williams,
	Steven Rostedt, syzbot, Christian Brauner, Jan Kara,
	linux-fsdevel, kernel list, syzkaller-bugs, Jeff Layton

On Sat, Jun 20 2026 at 07:44, Al Viro wrote:
> On Fri, Jun 19, 2026 at 02:52:08PM +0200, Thomas Gleixner wrote:
>> Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
>> Reported-by: syzbot+000c800a02097aaa10ed@syzkaller.appspotmail.com
>> Decoded-by: Jann Horn <jannh@google.com>
>> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
>> Cc: stable@vger.kernel.org
>
> Looks sane.
>
> ACKed-by: Al Viro <viro@zeniv.linux.org.uk>
>
> If RT folks see no subtle problems with that, it ought to go into mainline ASAP.

I'll queue it up tomorrow and send it Linus wards ASAP.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree]
  2026-06-18 20:59   ` Al Viro
  2026-06-18 21:03     ` Al Viro
@ 2026-06-21  0:46     ` Jeff Layton
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2026-06-21  0:46 UTC (permalink / raw)
  To: Al Viro, Jann Horn
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Boqun Feng, Waiman Long, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, syzbot, Christian Brauner,
	Jan Kara, linux-fsdevel, kernel list, syzkaller-bugs

On Thu, 2026-06-18 at 21:59 +0100, Al Viro wrote:
> On Thu, Jun 18, 2026 at 08:44:32PM +0200, Jann Horn wrote:
> > I think this is more of a bug in RT spinlocks than a VFS bug, though
> > it's a bit murky.
> > 
> > rt_spin_unlock() looks like this:
> > 
> > void __sched rt_spin_unlock(spinlock_t *lock) __releases(RCU)
> > {
> >         spin_release(&lock->dep_map, _RET_IP_);
> >         migrate_enable();
> >         rcu_read_unlock();
> > 
> >         if (unlikely(!rt_mutex_cmpxchg_release(&lock->lock, current, NULL)))
> >                 rt_mutex_slowunlock(&lock->lock);
> > }
> > 
> > Note how the RCU read-side critical section and the protection against
> > migration end *before* the lock is actually released, which means this
> > can UAF if the RCU read-side critical section implied by the spinlock
> > is the only thing keeping the lock alive. While non-RT spinlocks do
> > this the other way around (do_raw_spin_unlock() before
> > preempt_enable()):
> > 
> > static inline void __raw_spin_unlock(raw_spinlock_t *lock)
> >         __releases(lock)
> > {
> >         spin_release(&lock->dep_map, _RET_IP_);
> >         do_raw_spin_unlock(lock);
> >         preempt_enable();
> > }
> > 
> > https://docs.kernel.org/next/RCU/whatisRCU.html guarantees that
> > spinlock APIs imply RCU, and
> > https://docs.kernel.org/locking/mutex-design.html says: "This is in
> > contrast with spin_unlock() [...], which APIs can be used to guarantee
> > that the memory is not touched by the lock implementation after
> > spin_unlock()/completion_done() releases the lock.".
> > Neither of these explicitly guarantees that the RCU read-side critical
> > section (and the protection against migration?) should still hold
> > while the lock is being dropped, but I think that would fit best with
> > the explicit guarantees?
> 
> I'm trying to recall if PREEMPT_RT had been enabled in the last round of
> UAF in that area back in early April...
> 
> As far as I'm concerned, we *do* need to keep RCU read-side critical area
> all the way until the end of spin_unlock(); it very well might be the
> only thing to prevent freeing the sucker under us.


Sorry for late reply, but I think it was not enabled in those kernels.
This was in the mail that I sent on April 10th:

# zgrep PREEMPT /proc/config.gz
CONFIG_PREEMPT_NONE_BUILD=y
CONFIG_ARCH_HAS_PREEMPT_LAZY=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
# CONFIG_PREEMPT_LAZY is not set
# CONFIG_PREEMPT_RT is not set
# CONFIG_PREEMPT_DYNAMIC is not set
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-06-21  0:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17 17:08 [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree syzbot
2026-06-18 18:44 ` rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree] Jann Horn
2026-06-18 20:59   ` Al Viro
2026-06-18 21:03     ` Al Viro
2026-06-18 22:24       ` Thomas Gleixner
2026-06-19  1:36         ` Al Viro
2026-06-19  8:39         ` Sebastian Andrzej Siewior
2026-06-19 12:46           ` Thomas Gleixner
2026-06-19 12:52             ` [PATCH V2] locking/rt: Fix the incorrect RCU protection in rt_spin_unlock() Thomas Gleixner
2026-06-19 12:58               ` Sebastian Andrzej Siewior
2026-06-20  6:44               ` Al Viro
2026-06-20 21:45                 ` Thomas Gleixner
2026-06-21  0:46     ` rt_spin_unlock order of operations [was: Re: [syzbot] [fs?] KASAN: slab-use-after-free Read in shrink_dcache_tree] Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox