public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [Question] Detecting Sleep-in-Atomic Context in PREEMPT_RT via RV (Runtime Verification) monitor rtapp:sleep
@ 2025-10-27  6:54 Yunseong Kim
  2025-10-27 12:20 ` Gabriele Monaco
  2025-11-05  9:10 ` Nam Cao
  0 siblings, 2 replies; 14+ messages in thread
From: Yunseong Kim @ 2025-10-27  6:54 UTC (permalink / raw)
  To: Nam Cao
  Cc: Sebastian Andrzej Siewior, Tomas Glozar, Shung-Hsi Yu,
	Byungchul Park, syzkaller, linux-rt-devel, LKML

Hi Nam,

I've been very interested in RV (Runtime Verification) to proactively detect
"sleep in atomic" scenarios on PREEMPT_RT kernels. Specifically, I'm looking
for ways to find cases where sleeping spinlocks or memory allocations are used
within preemption-disabled or irq-disabled contexts. While searching for
solutions, I discovered the RV subsystem.

I've tested with it as follows, and I have a few questions.

# cat /sys/kernel/tracing/rv/available_monitors
wwnr
rtapp
rtapp:sleep

# cat /sys/kernel/tracing/rv/available_reactors
nop
printk
panic

# echo printk > /sys/kernel/tracing/rv/monitors/rtapp/sleep/reactors

# cat /sys/kernel/tracing/rv/monitors/rtapp/sleep/enable
1

# echo rtapp:sleep > /sys/kernel/tracing/rv/enabled_monitors

> [192735.309072] [   T6957] rv: sleep: multipathd[6957]: violation detected

# echo panic > /sys/kernel/tracing/rv/monitors/rtapp/sleep/reactors

> [ T6957] Kernel panic - not syncing: rv: sleep: multipathd[6957]: violation detected
> [193521.768666][ T6957] CPU: 4 UID: 0 PID: 6957 Comm: multipathd Not tainted 6.17.0-rc3-g39f90c196721 #1 PREEMPT_{RT,(full)}
> [193521.771727][ T6957] Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8ubuntu3 10/08/2025
> [193521.774126][ T6957] Call trace:
> [193521.774998][ T6957]  show_stack+0x2c/0x3c (C)
> [193521.776281][ T6957]  __dump_stack+0x30/0x40
> [193521.777523][ T6957]  dump_stack_lvl+0x34/0x2bc
> [193521.778797][ T6957]  dump_stack+0x1c/0x48
> [193521.779984][ T6957]  vpanic+0x220/0x618
> [193521.781211][ T6957]  oom_killer_enable+0x0/0x30
> [193521.782512][ T6957]  ltl_validate+0x7ac/0xb1c
> [193521.783870][ T6957]  ltl_atom_update+0xd0/0x32c
> [193521.785198][ T6957]  handle_sched_set_state+0xb8/0x12c
> [193521.786773][ T6957]  __trace_set_current_state+0x128/0x174
> [193521.788450][ T6957]  do_nanosleep+0x128/0x2a4
> [193521.789731][ T6957]  hrtimer_nanosleep+0xb4/0x160
> [193521.791167][ T6957]  common_nsleep+0x6c/0x84
> [193521.792404][ T6957]  __arm64_sys_clock_nanosleep+0x1a8/0x1f0
> [193521.794031][ T6957]  invoke_syscall+0x64/0x168
> [193521.795353][ T6957]  el0_svc_common+0x134/0x164
> [193521.796707][ T6957]  do_el0_svc+0x2c/0x3c
> [193521.797897][ T6957]  el0_svc+0x58/0x184
> [193521.799048][ T6957]  el0t_64_sync_handler+0x84/0x12c
> [193521.800514][ T6957]  el0t_64_sync+0x1b8/0x1bc
> [193521.801818][ T6957] SMP: stopping secondary CPUs
> [193521.803320][ T6957] Dumping ftrace buffer:
> [193521.804510][ T6957]    (ftrace buffer empty)
> [193521.805848][ T6957] Kernel Offset: disabled
> [193521.807084][ T6957] CPU features: 0xc0000,00007800,149a3161,357ff667
> [193521.808941][ T6957] Memory Limit: none
> [193522.655297][ T6957] Rebooting in 86400 seconds..

Here are my questions:

1. Does the rtapp:sleep monitor proactively detect scenarios that
   could lead to sleeping in atomic context, perhaps before
   CONFIG_DEBUG_ATOMIC_SLEEP (enabled) would trigger at the actual point of
   sleeping?

2. Is there a way to enable this monitor (e.g., rtapp:sleep)
   immediately as soon as the RV subsystem is loaded during boot time?
   (How to make this "default turn on"?)

3. When a "violation detected" message occurs at runtime, is it
   possible to get a call stack of the location that triggered the
   violation? The panic reactor provides a full stack, but I'm
   wondering if this is also possible with the printk reactor.


Here is some background on why I'm so interested in this topic:

Recently, I was fuzzing the PREEMPT_RT kernel with syzkaller but ran into
issues where fuzzing wouldn't proceed smoothly. It turned out to be a problem
in the kcov USB API. This issue was fixed after I reported it, together
with Sebastian’s patch.

[PATCH] kcov, usb: Don't disable interrupts in kcov_remote_start_usb_softirq()
 - https://lore.kernel.org/all/20250811082745.ycJqBXMs@linutronix.de/

After this fix, syzkaller fuzzing ran well and was able to detect several
runtime "sleep in atomic context" bugs:

[PATCH] USB: gadget: dummy-hcd: Fix locking bug in RT-enabled kernels
 - https://lore.kernel.org/all/bb192ae2-4eee-48ee-981f-3efdbbd0d8f0@rowland.harvard.edu/

[BUG] usbip: vhci: Sleeping function called from invalid context in
vhci_urb_enqueue on PREEMPT_RT
 - https://lore.kernel.org/all/c6c17f0d-b71d-4a44-bcef-2b65e4d634f7@kzalloc.com/

This led me to research ways to find these issues proactively at a
static analysis level, and I created some regex and coccinelle scripts
to detect them.

[BUG] gfs2: sleeping lock in gfs2_quota_init() with preempt disabled
on PREEMPT_RT
 - https://lore.kernel.org/all/20250812103808.3mIVpgs9@linutronix.de/t/#u

[PATCH] md/raid5-ppl: Fix invalid context sleep in
ppl_io_unit_finished() on PREEMPT_RT
 - https://lore.kernel.org/all/f2dbf110-e2a7-4101-b24c-0444f708fd4e@kernel.org/t/#u

Tomas, the author of the rtlockscope project, also gave me some deep
insights into this static analysis approach.

Re: [WIP] coccinelle: rt: Add coccicheck on sleep in atomic context on
PREEMPT_RT
 - https://lore.kernel.org/all/CAP4=nvTOE9W+6UtVZ5-5gAoYeEQE8g4cgG602FJDPesNko-Bgw@mail.gmail.com/


Thank you!

Best regards,
Yunseong Kim

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-12-23 15:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-27  6:54 [Question] Detecting Sleep-in-Atomic Context in PREEMPT_RT via RV (Runtime Verification) monitor rtapp:sleep Yunseong Kim
2025-10-27 12:20 ` Gabriele Monaco
2025-10-28 22:53   ` Yunseong Kim
2025-10-29  9:24     ` Gabriele Monaco
2025-11-05  9:10 ` Nam Cao
2025-12-02 11:14   ` Nam Cao
2025-12-02 11:26     ` Sebastian Andrzej Siewior
2025-12-11  4:30       ` Yunseong Kim
2025-12-11  5:42         ` Nam Cao
2025-12-11  7:58           ` Yunseong Kim
2025-12-22  7:40             ` Gabriele Monaco
2025-12-23 14:31               ` Yunseong Kim
2025-12-23 15:21                 ` Gabriele Monaco
2025-12-12  7:02       ` Dan Carpenter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox