* BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT
@ 2025-02-07 14:22 Luis Claudio R. Goncalves
2025-02-10 12:49 ` Mark Rutland
0 siblings, 1 reply; 9+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-02-07 14:22 UTC (permalink / raw)
To: linux-arm-kernel, linux-rt-devel, Catalin Marinas, Will Deacon,
Sebastian Andrzej Siewior, Steven Rostedt, Mark Rutland,
Ryan Roberts, Mark Brown, Ard Biesheuvel, Joey Gouly
Cc: linux-kernel
Hello!
While running ssdd[1] from rt-tests on an aarch64 kernel with PREEMPT_RT and
debug features enabled, this bug was triggered on every single run:
[1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git/tree/src/ssdd/ssdd.c
# ssdd
[ 273.115597] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 273.115607] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6077, name: ssdd
[ 273.115611] preempt_count: 1, expected: 0
[ 273.115614] RCU nest depth: 0, expected: 0
[ 273.115617] 1 lock held by ssdd/6077:
[ 273.115620] #0: ffff07ffd77893e0 (&sighand->siglock){+.+.}-{3:3}, at: force_sig_info_to_task+0x58/0x200
[ 273.115642] Preemption disabled at:
[ 273.115644] [<ffffad24518bd1f4>] debug_exception_enter+0x1c/0x80
[ 273.115653] CPU: 47 UID: 0 PID: 6077 Comm: ssdd Not tainted 6.13.0-rt3 #1 PREEMPT_RT
[ 273.115659] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022
[ 273.115662] Call trace:
[ 273.115664] show_stack+0x34/0x98 (C)
[ 273.115670] dump_stack_lvl+0xa8/0xe8
[ 273.115675] dump_stack+0x1c/0x38
[ 273.115680] __might_resched+0x254/0x330
[ 273.115686] rt_spin_lock+0xcc/0x220
[ 273.115692] force_sig_info_to_task+0x58/0x200
[ 273.115697] force_sig_fault+0xd0/0x120
[ 273.115702] arm64_force_sig_fault+0x48/0x80
[ 273.115707] send_user_sigtrap+0x88/0xe8
[ 273.115712] single_step_handler+0x100/0x160
[ 273.115717] do_debug_exception+0x94/0x160
[ 273.115722] el0_dbg+0x54/0x150
[ 273.115727] el0t_64_sync_handler+0x134/0x138
[ 273.115732] el0t_64_sync+0x1ac/0x1b0
The ptrace usage in ssdd eventually exercises the code path that starts on
el0t_64_sync_handler() and may end up calling do_debug_exception(), which
calls debug_exception_enter() that disables preemption.
Looking at the backtrace, later in the call chain force_sig_info_to_task()
tries to take a spinlock, which on PREEMPT_RT becomes a rtmutex and could
sleep in case of contention. That triggers the "BUG: sleeping function
called from invalid context" warning.
It is also possible to reproduce the problem in an aarch64 kernel with
PREEMPT_RT enabled, no extra debug features, by running ssdd in a loop.
With that we can see not only the backtrace reported above but also other
instances where the process is scheduled out while preemption is disabled:
# while :; do ssdd; done
[ 754.673678] BUG: scheduling while atomic: ssdd/7340/0x00000002
[ 754.673682] Modules linked in: qrtr rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_dmc620_pmu arm_cmn cppc_cpufreq arm_dsu_pmu loop dm_multipath nfnetlink xfs nvme ghash_ce sha2_ce sha256_arm64 nvme_core sha1_ce nvme_auth sbsa_gwdt ast i2c_algo_bit i2c_designware_platform xgene_hwmon i2c_designware_core dm_mirror dm_region_hash dm_log dm_mod fuse
[ 754.673703] Preemption disabled at:
[ 754.673703] [<ffffa87a17ca470c>] do_debug_exception+0x54/0x100
[ 754.673710] CPU: 102 UID: 0 PID: 7340 Comm: ssdd Kdump: loaded Not tainted 6.14.0-rc1 #1
[ 754.673712] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022
[ 754.673713] Call trace:
[ 754.673714] show_stack+0x34/0x98 (C)
[ 754.673718] dump_stack_lvl+0x80/0xa8
[ 754.673721] dump_stack+0x18/0x2c
[ 754.673722] __schedule_bug+0x90/0xc0
[ 754.673726] schedule_debug.isra.0+0x128/0x158
[ 754.673728] __schedule+0x68/0x690
[ 754.673731] schedule_rtlock+0x24/0x50
[ 754.673733] rtlock_slowlock_locked+0x1c0/0x350
[ 754.673735] rt_spin_lock+0xcc/0x130
[ 754.673737] obj_cgroup_charge+0x54/0x138
[ 754.673740] __memcg_slab_post_alloc_hook+0xcc/0x300
[ 754.673743] kmem_cache_alloc_noprof+0x304/0x338
[ 754.673745] __send_signal_locked+0x90/0x428
[ 754.673748] send_signal_locked+0xe4/0x140
[ 754.673750] force_sig_info_to_task+0xd0/0x160
[ 754.673753] force_sig_fault+0x6c/0xa8
[ 754.673755] arm64_force_sig_fault+0x48/0x80
[ 754.673757] send_user_sigtrap+0x54/0xd0
[ 754.673759] single_step_handler+0xc4/0xe0
[ 754.673761] do_debug_exception+0x7c/0x100
[ 754.673762] el0_dbg+0x40/0x158
[ 754.673766] el0t_64_sync_handler+0x134/0x138
[ 754.673768] el0t_64_sync+0x1ac/0x1b0
In this case one of the local_lock_* calls in (the functions called by)
obj_cgroup_charge() seems to hit contention and, as it is dealing with
rtmutexes, be effectively scheduled out to sleep.
The scary comment on top of debug_exception_enter() provides a reason for
preemption being disabled at that point, but it seems to open a can of worms
for PREEMPT_RT usage:
/*
* In debug exception context, we explicitly disable preemption despite
* having interrupts disabled.
* This serves two purposes: it makes it much less likely that we would
* accidentally schedule in exception context and it will force a warning
* if we somehow manage to schedule by accident.
*/
This is the data I gathered so far, using both v6.13.0-rt3 and 6.14.0-rc1
for testing. But due to my ignorance wrt the debug exception treatment in
aarch64 I can't devise a solution for the observed behavior.
Any suggestions or comments?
Best regards,
Luis
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT 2025-02-07 14:22 BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT Luis Claudio R. Goncalves @ 2025-02-10 12:49 ` Mark Rutland 2025-02-10 14:06 ` Sebastian Andrzej Siewior 2025-02-11 14:34 ` Luis Claudio R. Goncalves 0 siblings, 2 replies; 9+ messages in thread From: Mark Rutland @ 2025-02-10 12:49 UTC (permalink / raw) To: Luis Claudio R. Goncalves Cc: linux-arm-kernel, linux-rt-devel, Catalin Marinas, Will Deacon, Sebastian Andrzej Siewior, Steven Rostedt, Ryan Roberts, Mark Brown, Ard Biesheuvel, Joey Gouly, linux-kernel On Fri, Feb 07, 2025 at 11:22:57AM -0300, Luis Claudio R. Goncalves wrote: > Hello! Hi, > While running ssdd[1] from rt-tests on an aarch64 kernel with PREEMPT_RT and > debug features enabled, this bug was triggered on every single run: > > [1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git/tree/src/ssdd/ssdd.c > > # ssdd > > [ 273.115597] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > [ 273.115607] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6077, name: ssdd > [ 273.115611] preempt_count: 1, expected: 0 > [ 273.115614] RCU nest depth: 0, expected: 0 > [ 273.115617] 1 lock held by ssdd/6077: > [ 273.115620] #0: ffff07ffd77893e0 (&sighand->siglock){+.+.}-{3:3}, at: force_sig_info_to_task+0x58/0x200 > [ 273.115642] Preemption disabled at: > [ 273.115644] [<ffffad24518bd1f4>] debug_exception_enter+0x1c/0x80 > [ 273.115653] CPU: 47 UID: 0 PID: 6077 Comm: ssdd Not tainted 6.13.0-rt3 #1 PREEMPT_RT > [ 273.115659] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022 > [ 273.115662] Call trace: > [ 273.115664] show_stack+0x34/0x98 (C) > [ 273.115670] dump_stack_lvl+0xa8/0xe8 > [ 273.115675] dump_stack+0x1c/0x38 > [ 273.115680] __might_resched+0x254/0x330 > [ 273.115686] rt_spin_lock+0xcc/0x220 > [ 273.115692] force_sig_info_to_task+0x58/0x200 > [ 273.115697] force_sig_fault+0xd0/0x120 > [ 273.115702] arm64_force_sig_fault+0x48/0x80 > [ 273.115707] send_user_sigtrap+0x88/0xe8 > [ 273.115712] single_step_handler+0x100/0x160 > [ 273.115717] do_debug_exception+0x94/0x160 > [ 273.115722] el0_dbg+0x54/0x150 > [ 273.115727] el0t_64_sync_handler+0x134/0x138 > [ 273.115732] el0t_64_sync+0x1ac/0x1b0 > > The ptrace usage in ssdd eventually exercises the code path that starts on > el0t_64_sync_handler() and may end up calling do_debug_exception(), which > calls debug_exception_enter() that disables preemption. > > Looking at the backtrace, later in the call chain force_sig_info_to_task() > tries to take a spinlock, which on PREEMPT_RT becomes a rtmutex and could > sleep in case of contention. That triggers the "BUG: sleeping function > called from invalid context" warning. > > It is also possible to reproduce the problem in an aarch64 kernel with > PREEMPT_RT enabled, no extra debug features, by running ssdd in a loop. > With that we can see not only the backtrace reported above but also other > instances where the process is scheduled out while preemption is disabled: > > # while :; do ssdd; done > > [ 754.673678] BUG: scheduling while atomic: ssdd/7340/0x00000002 > [ 754.673682] Modules linked in: qrtr rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_dmc620_pmu arm_cmn cppc_cpufreq arm_dsu_pmu loop dm_multipath nfnetlink xfs nvme ghash_ce sha2_ce sha256_arm64 nvme_core sha1_ce nvme_auth sbsa_gwdt ast i2c_algo_bit i2c_designware_platform xgene_hwmon i2c_designware_core dm_mirror dm_region_hash dm_log dm_mod fuse > [ 754.673703] Preemption disabled at: > [ 754.673703] [<ffffa87a17ca470c>] do_debug_exception+0x54/0x100 > [ 754.673710] CPU: 102 UID: 0 PID: 7340 Comm: ssdd Kdump: loaded Not tainted 6.14.0-rc1 #1 > [ 754.673712] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022 > [ 754.673713] Call trace: > [ 754.673714] show_stack+0x34/0x98 (C) > [ 754.673718] dump_stack_lvl+0x80/0xa8 > [ 754.673721] dump_stack+0x18/0x2c > [ 754.673722] __schedule_bug+0x90/0xc0 > [ 754.673726] schedule_debug.isra.0+0x128/0x158 > [ 754.673728] __schedule+0x68/0x690 > [ 754.673731] schedule_rtlock+0x24/0x50 > [ 754.673733] rtlock_slowlock_locked+0x1c0/0x350 > [ 754.673735] rt_spin_lock+0xcc/0x130 > [ 754.673737] obj_cgroup_charge+0x54/0x138 > [ 754.673740] __memcg_slab_post_alloc_hook+0xcc/0x300 > [ 754.673743] kmem_cache_alloc_noprof+0x304/0x338 > [ 754.673745] __send_signal_locked+0x90/0x428 > [ 754.673748] send_signal_locked+0xe4/0x140 > [ 754.673750] force_sig_info_to_task+0xd0/0x160 > [ 754.673753] force_sig_fault+0x6c/0xa8 > [ 754.673755] arm64_force_sig_fault+0x48/0x80 > [ 754.673757] send_user_sigtrap+0x54/0xd0 > [ 754.673759] single_step_handler+0xc4/0xe0 > [ 754.673761] do_debug_exception+0x7c/0x100 > [ 754.673762] el0_dbg+0x40/0x158 > [ 754.673766] el0t_64_sync_handler+0x134/0x138 > [ 754.673768] el0t_64_sync+0x1ac/0x1b0 > > In this case one of the local_lock_* calls in (the functions called by) > obj_cgroup_charge() seems to hit contention and, as it is dealing with > rtmutexes, be effectively scheduled out to sleep. > > The scary comment on top of debug_exception_enter() provides a reason for > preemption being disabled at that point, but it seems to open a can of worms > for PREEMPT_RT usage: > > /* > * In debug exception context, we explicitly disable preemption despite > * having interrupts disabled. > * This serves two purposes: it makes it much less likely that we would > * accidentally schedule in exception context and it will force a warning > * if we somehow manage to schedule by accident. > */ > > This is the data I gathered so far, using both v6.13.0-rt3 and 6.14.0-rc1 > for testing. But due to my ignorance wrt the debug exception treatment in > aarch64 I can't devise a solution for the observed behavior. > > Any suggestions or comments? I don't have an immediate suggestion; I'll need to go think about this for a bit. Unfortunatealy, there are several nested cans of worms here. :/ In theory, we can go split out the EL0 "debug exceptions" into separate handlers, and wouldn't generally need to disable preemption for things like BRK or single-step. However, it's not immediately clear to me how we could handle watchpoints or breakpoints, since for those preemption/interruption could change the HW state under our feet, and we rely on single-step to skip past the watchpoint/breakpoint after it is handled. That, and last I looked reworking this we'd need to do a larger rework to split out those "debug exceptions" because of that way that currently bounces through the fault handling ligic in arch/arm64/mm/. Mark. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT 2025-02-10 12:49 ` Mark Rutland @ 2025-02-10 14:06 ` Sebastian Andrzej Siewior 2025-02-12 0:48 ` Luis Claudio R. Goncalves 2025-02-11 14:34 ` Luis Claudio R. Goncalves 1 sibling, 1 reply; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2025-02-10 14:06 UTC (permalink / raw) To: Mark Rutland Cc: Luis Claudio R. Goncalves, linux-arm-kernel, linux-rt-devel, Catalin Marinas, Will Deacon, Steven Rostedt, Ryan Roberts, Mark Brown, Ard Biesheuvel, Joey Gouly, linux-kernel On 2025-02-10 12:49:45 [+0000], Mark Rutland wrote: > Hi, Hi, > I don't have an immediate suggestion; I'll need to go think about this > for a bit. Unfortunatealy, there are several nested cans of worms here. > :/ > > In theory, we can go split out the EL0 "debug exceptions" into separate > handlers, and wouldn't generally need to disable preemption for things > like BRK or single-step. > > However, it's not immediately clear to me how we could handle > watchpoints or breakpoints, since for those preemption/interruption > could change the HW state under our feet, and we rely on single-step to > skip past the watchpoint/breakpoint after it is handled. Couldn't you delay sending signals until after the preempt-disable section? > That, and last I looked reworking this we'd need to do a larger rework > to split out those "debug exceptions" because of that way that currently > bounces through the fault handling ligic in arch/arm64/mm/. > > Mark. Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT 2025-02-10 14:06 ` Sebastian Andrzej Siewior @ 2025-02-12 0:48 ` Luis Claudio R. Goncalves 2025-02-12 11:21 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 9+ messages in thread From: Luis Claudio R. Goncalves @ 2025-02-12 0:48 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Mark Rutland, linux-arm-kernel, linux-rt-devel, Catalin Marinas, Will Deacon, Steven Rostedt, Ryan Roberts, Mark Brown, Ard Biesheuvel, Joey Gouly, linux-kernel On Mon, Feb 10, 2025 at 03:06:57PM +0100, Sebastian Andrzej Siewior wrote: > On 2025-02-10 12:49:45 [+0000], Mark Rutland wrote: > > Hi, > Hi, > > > I don't have an immediate suggestion; I'll need to go think about this > > for a bit. Unfortunatealy, there are several nested cans of worms here. > > :/ > > > > In theory, we can go split out the EL0 "debug exceptions" into separate > > handlers, and wouldn't generally need to disable preemption for things > > like BRK or single-step. > > > > However, it's not immediately clear to me how we could handle > > watchpoints or breakpoints, since for those preemption/interruption > > could change the HW state under our feet, and we rely on single-step to > > skip past the watchpoint/breakpoint after it is handled. > > Couldn't you delay sending signals until after the preempt-disable > section? Looking at do_debug_exception, void do_debug_exception(unsigned long addr_if_watchpoint, unsigned long esr, struct pt_regs *regs) { const struct fault_info *inf = esr_to_debug_fault_info(esr); unsigned long pc = instruction_pointer(regs); debug_exception_enter(regs); if (user_mode(regs) && !is_ttbr0_addr(pc)) arm64_apply_bp_hardening(); if (inf->fn(addr_if_watchpoint, esr, regs)) { arm64_notify_die(inf->name, regs, inf->sig, inf->code, pc, esr); } debug_exception_exit(regs); } NOKPROBE_SYMBOL(do_debug_exception); Do you mean executing the arm64_notify_die(inf->name, regs, inf->sig, inf->code, pc, esr); after re-enabling the preemption or do you mean something more sophisticated? Luis > > That, and last I looked reworking this we'd need to do a larger rework > > to split out those "debug exceptions" because of that way that currently > > bounces through the fault handling ligic in arch/arm64/mm/. > > > > Mark. > > Sebastian > ---end quoted text--- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT 2025-02-12 0:48 ` Luis Claudio R. Goncalves @ 2025-02-12 11:21 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2025-02-12 11:21 UTC (permalink / raw) To: Luis Claudio R. Goncalves Cc: Mark Rutland, linux-arm-kernel, linux-rt-devel, Catalin Marinas, Will Deacon, Steven Rostedt, Ryan Roberts, Mark Brown, Ard Biesheuvel, Joey Gouly, linux-kernel On 2025-02-11 21:48:08 [-0300], Luis Claudio R. Goncalves wrote: > > Couldn't you delay sending signals until after the preempt-disable > > section? > > Looking at do_debug_exception, > > void do_debug_exception(unsigned long addr_if_watchpoint, unsigned long esr, > struct pt_regs *regs) > { > const struct fault_info *inf = esr_to_debug_fault_info(esr); > unsigned long pc = instruction_pointer(regs); > > debug_exception_enter(regs); > > if (user_mode(regs) && !is_ttbr0_addr(pc)) > arm64_apply_bp_hardening(); > > if (inf->fn(addr_if_watchpoint, esr, regs)) { > arm64_notify_die(inf->name, regs, inf->sig, inf->code, pc, esr); > } > > debug_exception_exit(regs); > } > NOKPROBE_SYMBOL(do_debug_exception); > > > Do you mean executing the > > arm64_notify_die(inf->name, regs, inf->sig, inf->code, pc, esr); > > after re-enabling the preemption or do you mean something more > sophisticated? single_step_handler() and brk_handler() can both send signals. If inf->fn returns != 0 (which seems to be only be possible for brk_handler() out of the possible four callbacks) this this arm64_notify_die() sends a signal. So we have three potential signal delivers. It shouldn't matter if the signal is sent within the preempt-disable section or outside because the signal will be queued for current and delivered on the return to userland. So would just need to save the details and do it outside. Now that I actually looked at the code, this might lead to a bit of a mess so splitting out the signal part and avoiding the preempt-off part might better which would be the rework that Mark was talking about. > Luis Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT 2025-02-10 12:49 ` Mark Rutland 2025-02-10 14:06 ` Sebastian Andrzej Siewior @ 2025-02-11 14:34 ` Luis Claudio R. Goncalves 2025-02-12 0:35 ` Luis Claudio R. Goncalves 1 sibling, 1 reply; 9+ messages in thread From: Luis Claudio R. Goncalves @ 2025-02-11 14:34 UTC (permalink / raw) To: Mark Rutland Cc: linux-arm-kernel, linux-rt-devel, Catalin Marinas, Will Deacon, Sebastian Andrzej Siewior, Steven Rostedt, Ryan Roberts, Mark Brown, Ard Biesheuvel, Joey Gouly, linux-kernel On Mon, Feb 10, 2025 at 12:49:45PM +0000, Mark Rutland wrote: > On Fri, Feb 07, 2025 at 11:22:57AM -0300, Luis Claudio R. Goncalves wrote: > > Hello! > > Hi, > > > While running ssdd[1] from rt-tests on an aarch64 kernel with PREEMPT_RT and > > debug features enabled, this bug was triggered on every single run: > > > > [1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git/tree/src/ssdd/ssdd.c > > > > # ssdd > > > > [ 273.115597] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > > [ 273.115607] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6077, name: ssdd > > [ 273.115611] preempt_count: 1, expected: 0 > > [ 273.115614] RCU nest depth: 0, expected: 0 > > [ 273.115617] 1 lock held by ssdd/6077: > > [ 273.115620] #0: ffff07ffd77893e0 (&sighand->siglock){+.+.}-{3:3}, at: force_sig_info_to_task+0x58/0x200 > > [ 273.115642] Preemption disabled at: > > [ 273.115644] [<ffffad24518bd1f4>] debug_exception_enter+0x1c/0x80 > > [ 273.115653] CPU: 47 UID: 0 PID: 6077 Comm: ssdd Not tainted 6.13.0-rt3 #1 PREEMPT_RT > > [ 273.115659] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022 > > [ 273.115662] Call trace: > > [ 273.115664] show_stack+0x34/0x98 (C) > > [ 273.115670] dump_stack_lvl+0xa8/0xe8 > > [ 273.115675] dump_stack+0x1c/0x38 > > [ 273.115680] __might_resched+0x254/0x330 > > [ 273.115686] rt_spin_lock+0xcc/0x220 > > [ 273.115692] force_sig_info_to_task+0x58/0x200 > > [ 273.115697] force_sig_fault+0xd0/0x120 > > [ 273.115702] arm64_force_sig_fault+0x48/0x80 > > [ 273.115707] send_user_sigtrap+0x88/0xe8 > > [ 273.115712] single_step_handler+0x100/0x160 > > [ 273.115717] do_debug_exception+0x94/0x160 > > [ 273.115722] el0_dbg+0x54/0x150 > > [ 273.115727] el0t_64_sync_handler+0x134/0x138 > > [ 273.115732] el0t_64_sync+0x1ac/0x1b0 > > > > The ptrace usage in ssdd eventually exercises the code path that starts on > > el0t_64_sync_handler() and may end up calling do_debug_exception(), which > > calls debug_exception_enter() that disables preemption. > > > > Looking at the backtrace, later in the call chain force_sig_info_to_task() > > tries to take a spinlock, which on PREEMPT_RT becomes a rtmutex and could > > sleep in case of contention. That triggers the "BUG: sleeping function > > called from invalid context" warning. > > > > It is also possible to reproduce the problem in an aarch64 kernel with > > PREEMPT_RT enabled, no extra debug features, by running ssdd in a loop. > > With that we can see not only the backtrace reported above but also other > > instances where the process is scheduled out while preemption is disabled: > > > > # while :; do ssdd; done > > > > [ 754.673678] BUG: scheduling while atomic: ssdd/7340/0x00000002 > > [ 754.673682] Modules linked in: qrtr rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_dmc620_pmu arm_cmn cppc_cpufreq arm_dsu_pmu loop dm_multipath nfnetlink xfs nvme ghash_ce sha2_ce sha256_arm64 nvme_core sha1_ce nvme_auth sbsa_gwdt ast i2c_algo_bit i2c_designware_platform xgene_hwmon i2c_designware_core dm_mirror dm_region_hash dm_log dm_mod fuse > > [ 754.673703] Preemption disabled at: > > [ 754.673703] [<ffffa87a17ca470c>] do_debug_exception+0x54/0x100 > > [ 754.673710] CPU: 102 UID: 0 PID: 7340 Comm: ssdd Kdump: loaded Not tainted 6.14.0-rc1 #1 > > [ 754.673712] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022 > > [ 754.673713] Call trace: > > [ 754.673714] show_stack+0x34/0x98 (C) > > [ 754.673718] dump_stack_lvl+0x80/0xa8 > > [ 754.673721] dump_stack+0x18/0x2c > > [ 754.673722] __schedule_bug+0x90/0xc0 > > [ 754.673726] schedule_debug.isra.0+0x128/0x158 > > [ 754.673728] __schedule+0x68/0x690 > > [ 754.673731] schedule_rtlock+0x24/0x50 > > [ 754.673733] rtlock_slowlock_locked+0x1c0/0x350 > > [ 754.673735] rt_spin_lock+0xcc/0x130 > > [ 754.673737] obj_cgroup_charge+0x54/0x138 > > [ 754.673740] __memcg_slab_post_alloc_hook+0xcc/0x300 > > [ 754.673743] kmem_cache_alloc_noprof+0x304/0x338 > > [ 754.673745] __send_signal_locked+0x90/0x428 > > [ 754.673748] send_signal_locked+0xe4/0x140 > > [ 754.673750] force_sig_info_to_task+0xd0/0x160 > > [ 754.673753] force_sig_fault+0x6c/0xa8 > > [ 754.673755] arm64_force_sig_fault+0x48/0x80 > > [ 754.673757] send_user_sigtrap+0x54/0xd0 > > [ 754.673759] single_step_handler+0xc4/0xe0 > > [ 754.673761] do_debug_exception+0x7c/0x100 > > [ 754.673762] el0_dbg+0x40/0x158 > > [ 754.673766] el0t_64_sync_handler+0x134/0x138 > > [ 754.673768] el0t_64_sync+0x1ac/0x1b0 > > > > In this case one of the local_lock_* calls in (the functions called by) > > obj_cgroup_charge() seems to hit contention and, as it is dealing with > > rtmutexes, be effectively scheduled out to sleep. > > > > The scary comment on top of debug_exception_enter() provides a reason for > > preemption being disabled at that point, but it seems to open a can of worms > > for PREEMPT_RT usage: > > > > /* > > * In debug exception context, we explicitly disable preemption despite > > * having interrupts disabled. > > * This serves two purposes: it makes it much less likely that we would > > * accidentally schedule in exception context and it will force a warning > > * if we somehow manage to schedule by accident. > > */ > > > > This is the data I gathered so far, using both v6.13.0-rt3 and 6.14.0-rc1 > > for testing. But due to my ignorance wrt the debug exception treatment in > > aarch64 I can't devise a solution for the observed behavior. > > > > Any suggestions or comments? > > I don't have an immediate suggestion; I'll need to go think about this > for a bit. Unfortunatealy, there are several nested cans of worms here. > :/ > > In theory, we can go split out the EL0 "debug exceptions" into separate > handlers, and wouldn't generally need to disable preemption for things > like BRK or single-step. If this is an acceptable workaround, until we have the real solution, I can work on that :) Luis > However, it's not immediately clear to me how we could handle > watchpoints or breakpoints, since for those preemption/interruption > could change the HW state under our feet, and we rely on single-step to > skip past the watchpoint/breakpoint after it is handled. > > That, and last I looked reworking this we'd need to do a larger rework > to split out those "debug exceptions" because of that way that currently > bounces through the fault handling ligic in arch/arm64/mm/. > > Mark. > ---end quoted text--- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT 2025-02-11 14:34 ` Luis Claudio R. Goncalves @ 2025-02-12 0:35 ` Luis Claudio R. Goncalves 2025-02-12 12:40 ` Mark Rutland 0 siblings, 1 reply; 9+ messages in thread From: Luis Claudio R. Goncalves @ 2025-02-12 0:35 UTC (permalink / raw) To: Mark Rutland Cc: linux-arm-kernel, linux-rt-devel, Catalin Marinas, Will Deacon, Sebastian Andrzej Siewior, Steven Rostedt, Ryan Roberts, Mark Brown, Ard Biesheuvel, Joey Gouly, linux-kernel On Tue, Feb 11, 2025 at 11:34:26AM -0300, Luis Claudio R. Goncalves wrote: > On Mon, Feb 10, 2025 at 12:49:45PM +0000, Mark Rutland wrote: > > On Fri, Feb 07, 2025 at 11:22:57AM -0300, Luis Claudio R. Goncalves wrote: ... > > I don't have an immediate suggestion; I'll need to go think about this > > for a bit. Unfortunatealy, there are several nested cans of worms here. > > :/ > > > > In theory, we can go split out the EL0 "debug exceptions" into separate > > handlers, and wouldn't generally need to disable preemption for things > > like BRK or single-step. > > If this is an acceptable workaround, until we have the real solution, > I can work on that :) > > Luis I tested the prototype below and it survived 6h of ssdd and the ptrace LTP tests running simultaneously, in a tight loop. Would something along these lines be an acceptable workaround? diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 8b281cf308b30..eb3b54710024f 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -933,18 +933,20 @@ void __init hook_debug_fault_code(int nr, * accidentally schedule in exception context and it will force a warning * if we somehow manage to schedule by accident. */ -static void debug_exception_enter(struct pt_regs *regs) +static void debug_exception_enter(struct pt_regs *regs, int touch_preemption) { - preempt_disable(); + if (touch_preemption) + preempt_disable(); /* This code is a bit fragile. Test it. */ RCU_LOCKDEP_WARN(!rcu_is_watching(), "exception_enter didn't work"); } NOKPROBE_SYMBOL(debug_exception_enter); -static void debug_exception_exit(struct pt_regs *regs) +static void debug_exception_exit(struct pt_regs *regs, int touch_preemption) { - preempt_enable_no_resched(); + if (touch_preemption) + preempt_enable_no_resched(); } NOKPROBE_SYMBOL(debug_exception_exit); @@ -953,8 +955,14 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned long esr, { const struct fault_info *inf = esr_to_debug_fault_info(esr); unsigned long pc = instruction_pointer(regs); + unsigned long req = ESR_ELx_EC(esr); + int touch_preemption; - debug_exception_enter(regs); + touch_preemption = !(IS_ENABLED(CONFIG_PREEMPT_RT) && + (req == ESR_ELx_EC_SOFTSTP_LOW || req == ESR_ELx_EC_BRK64 + || req == ESR_ELx_EC_BKPT32 || req == ESR_ELx_EC_SOFTSTP_CUR)); + + debug_exception_enter(regs, touch_preemption); if (user_mode(regs) && !is_ttbr0_addr(pc)) arm64_apply_bp_hardening(); @@ -963,7 +971,7 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned long esr, arm64_notify_die(inf->name, regs, inf->sig, inf->code, pc, esr); } - debug_exception_exit(regs); + debug_exception_exit(regs, touch_preemption); } NOKPROBE_SYMBOL(do_debug_exception); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT 2025-02-12 0:35 ` Luis Claudio R. Goncalves @ 2025-02-12 12:40 ` Mark Rutland 2025-02-12 13:07 ` Mark Rutland 0 siblings, 1 reply; 9+ messages in thread From: Mark Rutland @ 2025-02-12 12:40 UTC (permalink / raw) To: Luis Claudio R. Goncalves Cc: linux-arm-kernel, linux-rt-devel, Catalin Marinas, Will Deacon, Sebastian Andrzej Siewior, Steven Rostedt, Ryan Roberts, Mark Brown, Ard Biesheuvel, Joey Gouly, linux-kernel On Tue, Feb 11, 2025 at 09:35:40PM -0300, Luis Claudio R. Goncalves wrote: > On Tue, Feb 11, 2025 at 11:34:26AM -0300, Luis Claudio R. Goncalves wrote: > > On Mon, Feb 10, 2025 at 12:49:45PM +0000, Mark Rutland wrote: > > > On Fri, Feb 07, 2025 at 11:22:57AM -0300, Luis Claudio R. Goncalves wrote: > ... > > > I don't have an immediate suggestion; I'll need to go think about this > > > for a bit. Unfortunatealy, there are several nested cans of worms here. > > > :/ > > > > > > In theory, we can go split out the EL0 "debug exceptions" into separate > > > handlers, and wouldn't generally need to disable preemption for things > > > like BRK or single-step. > > > > If this is an acceptable workaround, until we have the real solution, > > I can work on that :) > > > > Luis > > I tested the prototype below and it survived 6h of ssdd and the ptrace LTP > tests running simultaneously, in a tight loop. Would something along these > lines be an acceptable workaround? Sorry, no. This makes the code even more convoluted and less maintainable. If you want to fix single step specifically without bothering with the rest of the rework I mentioned, the right fix is to split that out from the other "debug exceptions", and handle that with the usual structure we have for other synchronous exceptions like FPAC/BTI/etc: * In arch/arm64/kernel/debug-monitors.c, add new do_el{1,0}_step() functions which handle their corresponding stepping logic. These should have the same rough shape as do_el{1,0}_fpac(), and should handle all the default signalling logic that would otherwise be handled by do_debug_exception(). The existing single_step_handler() should be removed, or at minimum refactored and only called by those new do_el{1,0}_step() functions. The existing hook_debug_fault_code() registration of single_step_handler() should be removed. I'm not sure whether do_el0_step() needs the arm64_apply_bp_hardening() logic, but do_el1_step() does not. * In entry-common.c, add new el{1,0}_step() functions. Each of el1h_64_sync_handler(), el0t_64_sync_handler(), and el0t_32_sync_handler() should be updated to call those rather than el{1,0}_dbg() for the corresponding EC values. In el0_step() it shouldn't be necessary to disable preemption, and that should be able to be: | static void noinstr el0_step(struct pt_regs *regs, unsigned long esr) | { | enter_from_user_mode(regs); | local_daif_restore(DAIF_PROCCTX); | do_el0_step(regs, esr); | exit_to_user_mode(regs); | } In el1_step(), I'm not *immediately sure* whether it's necessary to disable preemption, nor whether we need to treat this and use arm64_enter_el1_dbg() and arm64_exit_el1_dbg() rather than entry_from_kenrel_mode() and exit_to_kernel_mode(). So we either need: | static void noinstr el1_step(struct pt_regs *regs, unsigned long esr) | { | arm64_enter_el1_dbg(regs); | do_el1_step(regs, esr); | arm64_exit_el1_dbg(regs); | } ... or: | static void noinstr el1_step(struct pt_regs *regs, unsigned long esr) | { | enter_from_kernel_mode(regs); | local_daif_inherit(regs); | do_el1_step(regs, esr); | local_daif_mask(regs); | exit_to_kernel_mode(regs); | } With that, we're a step forward in the right direction. Mark. > > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index 8b281cf308b30..eb3b54710024f 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -933,18 +933,20 @@ void __init hook_debug_fault_code(int nr, > * accidentally schedule in exception context and it will force a warning > * if we somehow manage to schedule by accident. > */ > -static void debug_exception_enter(struct pt_regs *regs) > +static void debug_exception_enter(struct pt_regs *regs, int touch_preemption) > { > - preempt_disable(); > + if (touch_preemption) > + preempt_disable(); > > /* This code is a bit fragile. Test it. */ > RCU_LOCKDEP_WARN(!rcu_is_watching(), "exception_enter didn't work"); > } > NOKPROBE_SYMBOL(debug_exception_enter); > > -static void debug_exception_exit(struct pt_regs *regs) > +static void debug_exception_exit(struct pt_regs *regs, int touch_preemption) > { > - preempt_enable_no_resched(); > + if (touch_preemption) > + preempt_enable_no_resched(); > } > NOKPROBE_SYMBOL(debug_exception_exit); > > @@ -953,8 +955,14 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned long esr, > { > const struct fault_info *inf = esr_to_debug_fault_info(esr); > unsigned long pc = instruction_pointer(regs); > + unsigned long req = ESR_ELx_EC(esr); > + int touch_preemption; > > - debug_exception_enter(regs); > + touch_preemption = !(IS_ENABLED(CONFIG_PREEMPT_RT) && > + (req == ESR_ELx_EC_SOFTSTP_LOW || req == ESR_ELx_EC_BRK64 > + || req == ESR_ELx_EC_BKPT32 || req == ESR_ELx_EC_SOFTSTP_CUR)); > + > + debug_exception_enter(regs, touch_preemption); > > if (user_mode(regs) && !is_ttbr0_addr(pc)) > arm64_apply_bp_hardening(); > @@ -963,7 +971,7 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned long esr, > arm64_notify_die(inf->name, regs, inf->sig, inf->code, pc, esr); > } > > - debug_exception_exit(regs); > + debug_exception_exit(regs, touch_preemption); > } > NOKPROBE_SYMBOL(do_debug_exception); > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT 2025-02-12 12:40 ` Mark Rutland @ 2025-02-12 13:07 ` Mark Rutland 0 siblings, 0 replies; 9+ messages in thread From: Mark Rutland @ 2025-02-12 13:07 UTC (permalink / raw) To: Luis Claudio R. Goncalves Cc: linux-arm-kernel, linux-rt-devel, Catalin Marinas, Will Deacon, Sebastian Andrzej Siewior, Steven Rostedt, Ryan Roberts, Mark Brown, Ard Biesheuvel, Joey Gouly, linux-kernel On Wed, Feb 12, 2025 at 12:40:33PM +0000, Mark Rutland wrote: > * In entry-common.c, add new el{1,0}_step() functions. Each of > el1h_64_sync_handler(), el0t_64_sync_handler(), and > el0t_32_sync_handler() should be updated to call those rather than > el{1,0}_dbg() for the corresponding EC values. > > In el0_step() it shouldn't be necessary to disable preemption, and > that should be able to be: > > | static void noinstr el0_step(struct pt_regs *regs, unsigned long esr) > | { > | enter_from_user_mode(regs); > | local_daif_restore(DAIF_PROCCTX); > | do_el0_step(regs, esr); > | exit_to_user_mode(regs); > | } > > In el1_step(), I'm not *immediately sure* whether it's necessary to > disable preemption, nor whether we need to treat this and use > arm64_enter_el1_dbg() and arm64_exit_el1_dbg() rather than > entry_from_kenrel_mode() and exit_to_kernel_mode(). From another look, some care will need to be taken around reinstall_suspended_bps(), which will also need to be reworked. That definitely needs preemption disabled when poking the HW breakpoints, and today those can't change under our feet between entry and handling, so we'll need to think very hard about how that needs to work. Note that care needs to be taken with *any* approach that doesn't disable preemption. Mark. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-02-12 13:09 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-07 14:22 BUG: debug_exception_enter() disables preemption and may call sleeping functions on aarch64 with RT Luis Claudio R. Goncalves 2025-02-10 12:49 ` Mark Rutland 2025-02-10 14:06 ` Sebastian Andrzej Siewior 2025-02-12 0:48 ` Luis Claudio R. Goncalves 2025-02-12 11:21 ` Sebastian Andrzej Siewior 2025-02-11 14:34 ` Luis Claudio R. Goncalves 2025-02-12 0:35 ` Luis Claudio R. Goncalves 2025-02-12 12:40 ` Mark Rutland 2025-02-12 13:07 ` Mark Rutland
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).