linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [BUG] arm64: Sleeping function called from invalid context in do_debug_exception on PREEMPT_RT
@ 2025-08-13  5:01 Yunseong Kim
  2025-08-13  6:56 ` Yeoreum Yun
  0 siblings, 1 reply; 6+ messages in thread
From: Yunseong Kim @ 2025-08-13  5:01 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel
  Cc: Mark Rutland, Naresh Kamboju, Paul E. McKenney, Masami Hiramatsu,
	Austin Kim, Yeoreum Yun, linux-rt-devel, syzkaller, stable

Hi,

On a PREEMPT_RT kernel based on v6.16-rc1, I hit the following splat:

| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
| in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689
| preempt_count: 1, expected: 0
| RCU nest depth: 0, expected: 0
| Preemption disabled at:
| [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline]
| [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997
| CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT
| Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
| Call trace:
|  show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
|  __dump_stack+0x30/0x40 lib/dump_stack.c:94
|  dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120
|  dump_stack+0x1c/0x3c lib/dump_stack.c:129
|  __might_resched+0x2e4/0x52c kernel/sched/core.c:8800
|  __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]
|  rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57
|  spin_lock include/linux/spinlock_rt.h:44 [inline]
|  force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302
|  force_sig_fault_to_task kernel/signal.c:1699 [inline]
|  force_sig_fault+0xc4/0x110 kernel/signal.c:1704
|  arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265
|  send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline]
|  single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257
|  do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002
|  el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756
|  el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832
|  el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600


It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions
for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline,
removed code that previously avoided sleeping context issues when handling
debug exceptions:
Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/arm64/mm/fault.c?id=eaff68b3286116d499a3d4e513a36d772faba587

This appears to be triggered when force_sig_fault() is called from
debug exception context, which is not sleepable under PREEMPT_RT.

I understand that this path is primarily for debugging, but I would like
to discuss whether the patch needs some adjustment for PREEMPT_RT.

I also found that the issue can be reproduced depending on the changes
introduced by the following commit:
Link: https://github.com/torvalds/linux/commit/d8bb6718c4d

  arm64: Make debug exception handlers visible from RCU
  Make debug exceptions visible from RCU so that synchronize_rcu()
  correctly tracks the debug exception handler.

  This also introduces sanity checks for user-mode exceptions as same
  as x86's ist_enter()/ist_exit().

  The debug exception can interrupt in idle task. For example, it warns
  if we put a kprobe on a function called from idle task as below.
  The warning message showed that the rcu_read_lock() caused this
  problem. But actually, this means the RCU lost the context which
  was already in NMI/IRQ.

    /sys/kernel/debug/tracing # echo p default_idle_call >> kprobe_events
    /sys/kernel/debug/tracing # echo 1 > events/kprobes/enable
    ...

For reference:
- v5.2.10: https://elixir.bootlin.com/linux/v5.2.10/source/arch/arm64/mm/fault.c#L810
- v5.3-rc3: https://elixir.bootlin.com/linux/v5.3-rc3/source/arch/arm64/mm/fault.c#L787


Do we need to restore some form of non-sleeping signal delivery in debug
exception context for PREEMPT_RT, or is there another preferred fix?

Thanks,
Yunseong


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] arm64: Sleeping function called from invalid context in do_debug_exception on PREEMPT_RT
  2025-08-13  5:01 [BUG] arm64: Sleeping function called from invalid context in do_debug_exception on PREEMPT_RT Yunseong Kim
@ 2025-08-13  6:56 ` Yeoreum Yun
  2025-08-13  7:42   ` Yunseong Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Yeoreum Yun @ 2025-08-13  6:56 UTC (permalink / raw)
  To: Yunseong Kim
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, Mark Rutland,
	Naresh Kamboju, Paul E. McKenney, Masami Hiramatsu, Austin Kim,
	linux-rt-devel, syzkaller, stable

Hi Yunseong,

>
> | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689
> | preempt_count: 1, expected: 0
> | RCU nest depth: 0, expected: 0
> | Preemption disabled at:
> | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline]
> | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997
> | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT
> | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
> | Call trace:
> |  show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
> |  __dump_stack+0x30/0x40 lib/dump_stack.c:94
> |  dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120
> |  dump_stack+0x1c/0x3c lib/dump_stack.c:129
> |  __might_resched+0x2e4/0x52c kernel/sched/core.c:8800
> |  __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]
> |  rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57
> |  spin_lock include/linux/spinlock_rt.h:44 [inline]
> |  force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302
> |  force_sig_fault_to_task kernel/signal.c:1699 [inline]
> |  force_sig_fault+0xc4/0x110 kernel/signal.c:1704
> |  arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265
> |  send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline]
> |  single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257
> |  do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002
> |  el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756
> |  el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832
> |  el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
>
>
> It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions
> for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline,
> removed code that previously avoided sleeping context issues when handling
> debug exceptions:
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/arm64/mm/fault.c?id=eaff68b3286116d499a3d4e513a36d772faba587

No. Her patch commit 31575e11ecf7 (arm64: debug: split brk64 exception entry)
solves your splat since el0_brk64() doesn't call debug_exception_enter()
by spliting el0/el1 brk64 entry exception entry.

Formerly, el(0/1)_dbg() are handled in do_debug_exception() together
and it calls debug_exception_enter() disabling preemption and this makes
your splat while handling brk excepttion from el0.

[...]

Thanks.

--
Sincerely,
Yeoreum Yun


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] arm64: Sleeping function called from invalid context in do_debug_exception on PREEMPT_RT
  2025-08-13  6:56 ` Yeoreum Yun
@ 2025-08-13  7:42   ` Yunseong Kim
  2025-08-13  8:59     ` Yeoreum Yun
  0 siblings, 1 reply; 6+ messages in thread
From: Yunseong Kim @ 2025-08-13  7:42 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, Mark Rutland,
	Naresh Kamboju, Paul E. McKenney, Masami Hiramatsu, Austin Kim,
	linux-rt-devel, syzkaller, stable

Hi Yeoreum,

Thank you for pointing it!

On 8/13/25 3:56 PM, Yeoreum Yun wrote:
> Hi Yunseong,
> 
>>
>> | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
>> | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689
>> | preempt_count: 1, expected: 0
>> | RCU nest depth: 0, expected: 0
>> | Preemption disabled at:
>> | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline]
>> | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997
>> | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT
>> | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
>> | Call trace:
>> |  show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
>> |  __dump_stack+0x30/0x40 lib/dump_stack.c:94
>> |  dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120
>> |  dump_stack+0x1c/0x3c lib/dump_stack.c:129
>> |  __might_resched+0x2e4/0x52c kernel/sched/core.c:8800
>> |  __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]
>> |  rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57
>> |  spin_lock include/linux/spinlock_rt.h:44 [inline]
>> |  force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302
>> |  force_sig_fault_to_task kernel/signal.c:1699 [inline]
>> |  force_sig_fault+0xc4/0x110 kernel/signal.c:1704
>> |  arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265
>> |  send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline]
>> |  single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257
>> |  do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002
>> |  el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756
>> |  el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832
>> |  el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
>>
>>
>> It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions
>> for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline,
>> removed code that previously avoided sleeping context issues when handling
>> debug exceptions:
>> Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/arm64/mm/fault.c?id=eaff68b3286116d499a3d4e513a36d772faba587
> 
> No. Her patch commit 31575e11ecf7 (arm64: debug: split brk64 exception entry)
> solves your splat since el0_brk64() doesn't call debug_exception_enter()
> by spliting el0/el1 brk64 entry exception entry.
> 
> Formerly, el(0/1)_dbg() are handled in do_debug_exception() together
> and it calls debug_exception_enter() disabling preemption and this makes
> your splat while handling brk excepttion from el0.
> 

Do you think a fix is necessary if this issue also affects the LTS kernel
before 6.17-rc1? As far as I know, most production RT kernels are still
based on the existing LTS versions.

> 
> --
> Sincerely,
> Yeoreum Yun

Thank you,
Yunseong



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] arm64: Sleeping function called from invalid context in do_debug_exception on PREEMPT_RT
  2025-08-13  7:42   ` Yunseong Kim
@ 2025-08-13  8:59     ` Yeoreum Yun
  2025-08-13 10:06       ` Luis Claudio R. Goncalves
  0 siblings, 1 reply; 6+ messages in thread
From: Yeoreum Yun @ 2025-08-13  8:59 UTC (permalink / raw)
  To: Yunseong Kim
  Cc: Mark Rutland, Paul E. McKenney, Catalin Marinas, Naresh Kamboju,
	Austin Kim, stable, syzkaller, Masami Hiramatsu, Will Deacon,
	linux-rt-devel, linux-arm-kernel

+Ada Couprie Diaz

> Hi Yeoreum,
>
> Thank you for pointing it!
>
> On 8/13/25 3:56 PM, Yeoreum Yun wrote:
> > Hi Yunseong,
> >
> >>
> >> | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> >> | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689
> >> | preempt_count: 1, expected: 0
> >> | RCU nest depth: 0, expected: 0
> >> | Preemption disabled at:
> >> | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline]
> >> | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997
> >> | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT
> >> | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
> >> | Call trace:
> >> |  show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
> >> |  __dump_stack+0x30/0x40 lib/dump_stack.c:94
> >> |  dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120
> >> |  dump_stack+0x1c/0x3c lib/dump_stack.c:129
> >> |  __might_resched+0x2e4/0x52c kernel/sched/core.c:8800
> >> |  __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]
> >> |  rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57
> >> |  spin_lock include/linux/spinlock_rt.h:44 [inline]
> >> |  force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302
> >> |  force_sig_fault_to_task kernel/signal.c:1699 [inline]
> >> |  force_sig_fault+0xc4/0x110 kernel/signal.c:1704
> >> |  arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265
> >> |  send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline]
> >> |  single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257
> >> |  do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002
> >> |  el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756
> >> |  el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832
> >> |  el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
> >>
> >>
> >> It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions
> >> for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline,
> >> removed code that previously avoided sleeping context issues when handling
> >> debug exceptions:
> >> Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/arm64/mm/fault.c?id=eaff68b3286116d499a3d4e513a36d772faba587
> >
> > No. Her patch commit 31575e11ecf7 (arm64: debug: split brk64 exception entry)
> > solves your splat since el0_brk64() doesn't call debug_exception_enter()
> > by spliting el0/el1 brk64 entry exception entry.
> >
> > Formerly, el(0/1)_dbg() are handled in do_debug_exception() together
> > and it calls debug_exception_enter() disabling preemption and this makes
> > your splat while handling brk excepttion from el0.
> >
>
> Do you think a fix is necessary if this issue also affects the LTS kernel
> before 6.17-rc1? As far as I know, most production RT kernels are still
> based on the existing LTS versions.

IMHO, I think her patch should be backedported.

[0]: https://lore.kernel.org/all/20250707114109.35672-1-ada.coupriediaz@arm.com/

Thanks.

--
Sincerely,
Yeoreum Yun


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] arm64: Sleeping function called from invalid context in do_debug_exception on PREEMPT_RT
  2025-08-13  8:59     ` Yeoreum Yun
@ 2025-08-13 10:06       ` Luis Claudio R. Goncalves
  2025-08-13 11:43         ` Ada Couprie Diaz
  0 siblings, 1 reply; 6+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-08-13 10:06 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: Mark Rutland, Paul E. McKenney, Catalin Marinas, Naresh Kamboju,
	Austin Kim, stable, syzkaller, Masami Hiramatsu, Will Deacon,
	linux-rt-devel, linux-arm-kernel, Yunseong Kim

On Wed, Aug 13, 2025 at 09:59:06AM +0100, Yeoreum Yun wrote:
> +Ada Couprie Diaz
> 
> > Hi Yeoreum,
> >
> > Thank you for pointing it!
> >
> > On 8/13/25 3:56 PM, Yeoreum Yun wrote:
> > > Hi Yunseong,
> > >
> > >>
> > >> | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> > >> | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689
> > >> | preempt_count: 1, expected: 0
> > >> | RCU nest depth: 0, expected: 0
> > >> | Preemption disabled at:
> > >> | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline]
> > >> | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997
> > >> | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT
> > >> | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
> > >> | Call trace:
> > >> |  show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
> > >> |  __dump_stack+0x30/0x40 lib/dump_stack.c:94
> > >> |  dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120
> > >> |  dump_stack+0x1c/0x3c lib/dump_stack.c:129
> > >> |  __might_resched+0x2e4/0x52c kernel/sched/core.c:8800
> > >> |  __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]
> > >> |  rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57
> > >> |  spin_lock include/linux/spinlock_rt.h:44 [inline]
> > >> |  force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302
> > >> |  force_sig_fault_to_task kernel/signal.c:1699 [inline]
> > >> |  force_sig_fault+0xc4/0x110 kernel/signal.c:1704
> > >> |  arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265
> > >> |  send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline]
> > >> |  single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257
> > >> |  do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002
> > >> |  el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756
> > >> |  el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832
> > >> |  el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
> > >>
> > >>
> > >> It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions
> > >> for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline,
> > >> removed code that previously avoided sleeping context issues when handling
> > >> debug exceptions:
> > >> Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/arm64/mm/fault.c?id=eaff68b3286116d499a3d4e513a36d772faba587
> > >
> > > No. Her patch commit 31575e11ecf7 (arm64: debug: split brk64 exception entry)
> > > solves your splat since el0_brk64() doesn't call debug_exception_enter()
> > > by spliting el0/el1 brk64 entry exception entry.
> > >
> > > Formerly, el(0/1)_dbg() are handled in do_debug_exception() together
> > > and it calls debug_exception_enter() disabling preemption and this makes
> > > your splat while handling brk excepttion from el0.
> > >
> >
> > Do you think a fix is necessary if this issue also affects the LTS kernel
> > before 6.17-rc1? As far as I know, most production RT kernels are still
> > based on the existing LTS versions.
> 
> IMHO, I think her patch should be backedported.

I also strongly suggest backporting Ada's patch series, as without them
using anything that resorts to debug exceptions (ptrace, gdb, ...) on
aarch64 with PREEMPT_RT enabled may result in a backtrace or worse.

Luis

> 
> [0]: https://lore.kernel.org/all/20250707114109.35672-1-ada.coupriediaz@arm.com/
> 
> Thanks.
> 
> --
> Sincerely,
> Yeoreum Yun
> 
---end quoted text---



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] arm64: Sleeping function called from invalid context in do_debug_exception on PREEMPT_RT
  2025-08-13 10:06       ` Luis Claudio R. Goncalves
@ 2025-08-13 11:43         ` Ada Couprie Diaz
  0 siblings, 0 replies; 6+ messages in thread
From: Ada Couprie Diaz @ 2025-08-13 11:43 UTC (permalink / raw)
  To: Luis Claudio R. Goncalves, Yeoreum Yun, Yunseong Kim
  Cc: Mark Rutland, Paul E. McKenney, Catalin Marinas, Naresh Kamboju,
	Austin Kim, stable, syzkaller, Masami Hiramatsu, Will Deacon,
	linux-rt-devel, linux-arm-kernel

Hi all,

On 13/08/2025 11:06, Luis Claudio R. Goncalves wrote:
> On Wed, Aug 13, 2025 at 09:59:06AM +0100, Yeoreum Yun wrote:
>> +Ada Couprie Diaz
Thanks for the ping !
>>> Hi Yeoreum,
>>>
>>> Thank you for pointing it!
>>>
>>> On 8/13/25 3:56 PM, Yeoreum Yun wrote:
>>>> Hi Yunseong,
>>>>
>>>>> | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
>>>>> | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689
>>>>> | preempt_count: 1, expected: 0
>>>>> | RCU nest depth: 0, expected: 0
>>>>> | Preemption disabled at:
>>>>> | [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline]
>>>>> | [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997
>>>>> | CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT
>>>>> | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
>>>>> | Call trace:
>>>>> |  show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
>>>>> |  __dump_stack+0x30/0x40 lib/dump_stack.c:94
>>>>> |  dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120
>>>>> |  dump_stack+0x1c/0x3c lib/dump_stack.c:129
>>>>> |  __might_resched+0x2e4/0x52c kernel/sched/core.c:8800
>>>>> |  __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]
>>>>> |  rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57
>>>>> |  spin_lock include/linux/spinlock_rt.h:44 [inline]
>>>>> |  force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302
>>>>> |  force_sig_fault_to_task kernel/signal.c:1699 [inline]
>>>>> |  force_sig_fault+0xc4/0x110 kernel/signal.c:1704
>>>>> |  arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265
>>>>> |  send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline]
>>>>> |  single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257
>>>>> |  do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002
>>>>> |  el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756
>>>>> |  el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832
>>>>> |  el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
>>>>>
>>>>>
>>>>> It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions
>>>>> for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline,
>>>>> removed code that previously avoided sleeping context issues when handling
>>>>> debug exceptions:
>>>>> Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch/arm64/mm/fault.c?id=eaff68b3286116d499a3d4e513a36d772faba587
>>>> No. Her patch commit 31575e11ecf7 (arm64: debug: split brk64 exception entry)
>>>> solves your splat since el0_brk64() doesn't call debug_exception_enter()
>>>> by spliting el0/el1 brk64 entry exception entry.
>>>>
>>>> Formerly, el(0/1)_dbg() are handled in do_debug_exception() together
>>>> and it calls debug_exception_enter() disabling preemption and this makes
>>>> your splat while handling brk excepttion from el0.
That's correct : one of the goal of the series was to be able to
adapt each debug exception handler to what is needed,
which allowed us to keep preemption enabled, or re-enable it
much earlier, to prevent issues as above for some exceptions.
>>> Do you think a fix is necessary if this issue also affects the LTS kernel
>>> before 6.17-rc1? As far as I know, most production RT kernels are still
>>> based on the existing LTS versions.
Luis originally reported the issue on kernels 6.13-rt and 6.14-rc1[1].
After some quick testing, the issue is present on
6.1-rt, 6.6-rt and 6.12-rt as well.
5.15-rt either doesn't have the issue, or doesn't report it.
>> IMHO, I think her patch should be backedported.
> I also strongly suggest backporting Ada's patch series, as without them
> using anything that resorts to debug exceptions (ptrace, gdb, ...) on
> aarch64 with PREEMPT_RT enabled may result in a backtrace or worse.
>
> Luis
Hopefully it shouldn't be too hard to backport for recent kernels,
as I don't think those areas change a lot, but I haven't looked into it.

I'm not sure when I would have time to work on backporting, but
I'd be happy to help anyway or do it if I have the time in the future,
given there seems to be some interest (and good reasons).
>> [0]: https://lore.kernel.org/all/20250707114109.35672-1-ada.coupriediaz@arm.com/
>>
>> Thanks.
>>
>> --
>> Sincerely,
>> Yeoreum Yun
Best,
Ada
[1]: https://lore.kernel.org/linux-arm-kernel/Z6YW_Kx4S2tmj2BP@uudg.org/


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-13 12:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13  5:01 [BUG] arm64: Sleeping function called from invalid context in do_debug_exception on PREEMPT_RT Yunseong Kim
2025-08-13  6:56 ` Yeoreum Yun
2025-08-13  7:42   ` Yunseong Kim
2025-08-13  8:59     ` Yeoreum Yun
2025-08-13 10:06       ` Luis Claudio R. Goncalves
2025-08-13 11:43         ` Ada Couprie Diaz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).