linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] gfs2: sleeping lock in gfs2_quota_init() with preempt disabled on PREEMPT_RT
@ 2025-08-12  5:39 Yunseong Kim
  2025-08-12 10:38 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 2+ messages in thread
From: Yunseong Kim @ 2025-08-12  5:39 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christian Brauner, gfs2, linux-fsdevel, linux-rt-devel, syzkaller,
	Austin Kim

While testing with PREEMPT_RT enabled, I encountered a potential issue in
gfs2_quota_init() where preemption is disabled via bit_spin_lock(), but a
subsequent call eventually reaches a sleepable lock.

Below is the simplified call flow:
[PREEMPT_RT]

gfs2_quota_init()
   |
   v
  spin_lock_bucket()
   |-- bit_spin_lock()
   |     |
   |     +--> preempt_disable()   [Preemption disabled]
   v
  gfs2_qd_search_bucket()
       |
       v
     lockref_get_not_dead()
          |
          v
        spin_lock(&lockref->lock)   [Sleepable: sleeping in preemption disabled]

[NON-PREEMPT_RT]

gfs2_quota_init()
   |
   v
  spin_lock_bucket()
   |-- bit_spin_lock()
   |     |
   |     +--> preempt_disable()   [Preemption disabled]
   v
  gfs2_qd_search_bucket()
       |
       v
     lockref_get_not_dead()
          |
          v
        spin_lock(&lockref->lock)   [Non-sleeping spinlock]

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 15311, name: syz.6.947
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
Preemption disabled at:
[<ffff800081fa916c>] bit_spin_lock include/linux/bit_spinlock.h:25 [inline]
[<ffff800081fa916c>] hlist_bl_lock include/linux/list_bl.h:148 [inline]
[<ffff800081fa916c>] spin_lock_bucket+0x60/0x190 fs/gfs2/quota.c:98
CPU: 2 UID: 0 PID: 15311 Comm: syz.6.947 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT 
Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
Call trace:
 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
 __dump_stack+0x30/0x40 lib/dump_stack.c:94
 dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120
 dump_stack+0x1c/0x3c lib/dump_stack.c:129
 __might_resched+0x2e4/0x52c kernel/sched/core.c:8800
 __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]
 rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57
 spin_lock include/linux/spinlock_rt.h:44 [inline]
 lockref_get_not_dead+0x30/0xd0 lib/lockref.c:155
 gfs2_qd_search_bucket fs/gfs2/quota.c:269 [inline]
 gfs2_quota_init+0x778/0x1114 fs/gfs2/quota.c:1460
 gfs2_make_fs_rw+0x144/0x26c fs/gfs2/super.c:149
 gfs2_fill_super+0x1448/0x1a38 fs/gfs2/ops_fstype.c:1278
 get_tree_bdev_flags+0x360/0x414 fs/super.c:1679
 get_tree_bdev+0x2c/0x3c fs/super.c:1702
 gfs2_get_tree+0x54/0x1b4 fs/gfs2/ops_fstype.c:1335
 vfs_get_tree+0x90/0x28c fs/super.c:1802
 do_new_mount+0x228/0x814 fs/namespace.c:3885
 path_mount+0x574/0xd64 fs/namespace.c:4209
 do_mount fs/namespace.c:4222 [inline]
 __do_sys_mount fs/namespace.c:4433 [inline]
 __se_sys_mount fs/namespace.c:4410 [inline]
 __arm64_sys_mount+0x3e8/0x468 fs/namespace.c:4410
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
 el0_svc+0x40/0x13c arch/arm64/kernel/entry-common.c:767
 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
 el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600


On non-RT kernels this is fine because spin_lock() is non-sleeping, but
on PREEMPT_RT many spinlocks map to sleepable locks, which causes sleeping
while preemption disabled.

At the moment, two possible approaches come to mind:

1. Avoid using the bit-spinlock in this path entirely
 - Rework the bucket protection so we do not call into code that may sleep
   while preemption is disabled.
 - no sleeping while preemption-disabled.
 - May require reworking bucket protection/race re-check logic.

2. Replace the inner lock with a non-sleeping primitive
 - (e.g., use a raw_spinlock_t for lockref->lock) so it can be taken while
   preemption is disabled.
 - Since lockref does not currently support raw_spinlock_t, this would require
   additional rework of the lockref structure.
 - Minimal structural change; avoids sleeping in the problematic context.


I can help test patches for this on both PREEMPT_RT and NON-RT kernels.


Thanks,
Yunseong Kim

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [BUG] gfs2: sleeping lock in gfs2_quota_init() with preempt disabled on PREEMPT_RT
  2025-08-12  5:39 [BUG] gfs2: sleeping lock in gfs2_quota_init() with preempt disabled on PREEMPT_RT Yunseong Kim
@ 2025-08-12 10:38 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-12 10:38 UTC (permalink / raw)
  To: Yunseong Kim
  Cc: Andreas Gruenbacher, Christian Brauner, gfs2, linux-fsdevel,
	linux-rt-devel, syzkaller, Austin Kim

On 2025-08-12 14:39:29 [+0900], Yunseong Kim wrote:
> While testing with PREEMPT_RT enabled, I encountered a potential issue in
> gfs2_quota_init() where preemption is disabled via bit_spin_lock(), but a
> subsequent call eventually reaches a sleepable lock.
> 
> Below is the simplified call flow:
> [PREEMPT_RT]
> At the moment, two possible approaches come to mind:
> 
> 1. Avoid using the bit-spinlock in this path entirely
>  - Rework the bucket protection so we do not call into code that may sleep
>    while preemption is disabled.
>  - no sleeping while preemption-disabled.
>  - May require reworking bucket protection/race re-check logic.

So this would add 4096 spinlocks for RT. Given that those a bit bigger
than the other ones, it will use a bit more than 16KiB of memory.
Maybe fewer could be used assuming lock sharing is fine here but I
haven't checked the code

> 2. Replace the inner lock with a non-sleeping primitive
>  - (e.g., use a raw_spinlock_t for lockref->lock) so it can be taken while
>    preemption is disabled.
>  - Since lockref does not currently support raw_spinlock_t, this would require
>    additional rework of the lockref structure.
>  - Minimal structural change; avoids sleeping in the problematic context.

This is something. 
Looking at the layout of struct lockref USE_CMPXCHG_LOCKREF can only
work if the spinlock is the raw_spinlock_t without lockdep. So it
expects the lock to be unlocked and does a 64bit cmpxchg() to inc/ dec
the counter that follows.

I was confused why lockref_mark_dead() works for dcache given that the
lock is not released outside of lockref.c except for
lockref_put_or_lock(). But then I noticed d_lock.

So making the lock raw_spinlock_t breaks the few lockref_put_or_lock()
users. The impact on dcache is huge and I noticed some waitqueue
handling which would break. So there is that.

Based on this I think it should be handled somehow within gfs2.

> I can help test patches for this on both PREEMPT_RT and NON-RT kernels.
> 
> 
> Thanks,
> Yunseong Kim

Sebastian

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-08-12 10:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-12  5:39 [BUG] gfs2: sleeping lock in gfs2_quota_init() with preempt disabled on PREEMPT_RT Yunseong Kim
2025-08-12 10:38 ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).