* [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
@ 2025-07-25 20:14 Yunseong Kim
2025-07-26 6:36 ` Greg Kroah-Hartman
2025-08-08 16:33 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 9+ messages in thread
From: Yunseong Kim @ 2025-07-25 20:14 UTC (permalink / raw)
To: Dmitry Vyukov, Andrey Konovalov
Cc: Byungchul Park, max.byungchul.park, Yeoreum Yun, Michelle Jin,
linux-kernel, Yunseong Kim, Tetsuo Handa, Alan Stern,
Greg Kroah-Hartman, Thomas Gleixner, Sebastian Andrzej Siewior,
stable, kasan-dev, syzkaller, linux-usb, linux-rt-devel
When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following
bug is triggered in the ksoftirqd context.
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
| in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 30, name: ksoftirqd/1
| preempt_count: 0, expected: 0
| RCU nest depth: 2, expected: 2
| CPU: 1 UID: 0 PID: 30 Comm: ksoftirqd/1 Tainted: G W 6.16.0-rc1-rt1 #11 PREEMPT_RT
| Tainted: [W]=WARN
| Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
| Call trace:
| show_stack+0x2c/0x3c (C)
| __dump_stack+0x30/0x40
| dump_stack_lvl+0x148/0x1d8
| dump_stack+0x1c/0x3c
| __might_resched+0x2e4/0x52c
| rt_spin_lock+0xa8/0x1bc
| kcov_remote_start+0xb0/0x490
| __usb_hcd_giveback_urb+0x2d0/0x5e8
| usb_giveback_urb_bh+0x234/0x3c4
| process_scheduled_works+0x678/0xd18
| bh_worker+0x2f0/0x59c
| workqueue_softirq_action+0x104/0x14c
| tasklet_action+0x18/0x8c
| handle_softirqs+0x208/0x63c
| run_ksoftirqd+0x64/0x264
| smpboot_thread_fn+0x4ac/0x908
| kthread+0x5e8/0x734
| ret_from_fork+0x10/0x20
To reproduce on PREEMPT_RT kernel:
$ git remote add rt-devel git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git
$ git fetch rt-devel
$ git checkout -b v6.16-rc1-rt1 v6.16-rc1-rt1
I have attached the syzlang and the C source code converted by syz-prog2c:
Link: https://gist.github.com/kzall0c/9455aaa246f4aa1135353a51753adbbe
Then, run with a PREEMPT_RT config.
This issue was introduced by commit
f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq").
However, this creates a conflict on PREEMPT_RT kernels. The local_irq_save()
call establishes an atomic context where sleeping is forbidden. Inside this
context, kcov_remote_start() is called, which on PREEMPT_RT uses sleeping
locks (spinlock_t and local_lock_t are mapped to rt_mutex). This results in
a sleeping function called from invalid context.
On PREEMPT_RT, interrupt handlers are threaded, so the re-entrancy scenario
is already safely handled by the existing local_lock_t and the global
kcov_remote_lock within kcov_remote_start(). Therefore, the outer
local_irq_save() is not necessary.
This preserves the intended re-entrancy protection for non-RT kernels while
resolving the locking violation on PREEMPT_RT kernels.
After making this modification and testing it, syzkaller fuzzing the
PREEMPT_RT kernel is now running without stopping on latest announced
Real-time Linux.
Link: https://lore.kernel.org/linux-rt-devel/20250610080307.LMm1hleC@linutronix.de/
Fixes: f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq")
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Byungchul Park <byungchul@sk.com>
Cc: stable@vger.kernel.org
Cc: kasan-dev@googlegroups.com
Cc: syzkaller@googlegroups.com
Cc: linux-usb@vger.kernel.org
Cc: linux-rt-devel@lists.linux.dev
Signed-off-by: Yunseong Kim <ysk@kzalloc.com>
---
include/linux/kcov.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/linux/kcov.h b/include/linux/kcov.h
index 75a2fb8b16c3..c5e1b2dd0bb7 100644
--- a/include/linux/kcov.h
+++ b/include/linux/kcov.h
@@ -85,7 +85,9 @@ static inline unsigned long kcov_remote_start_usb_softirq(u64 id)
unsigned long flags = 0;
if (in_serving_softirq()) {
+#ifndef CONFIG_PREEMPT_RT
local_irq_save(flags);
+#endif
kcov_remote_start_usb(id);
}
@@ -96,7 +98,9 @@ static inline void kcov_remote_stop_softirq(unsigned long flags)
{
if (in_serving_softirq()) {
kcov_remote_stop();
+#ifndef CONFIG_PREEMPT_RT
local_irq_restore(flags);
+#endif
}
}
--
2.50.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
2025-07-25 20:14 [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT Yunseong Kim
@ 2025-07-26 6:36 ` Greg Kroah-Hartman
2025-07-26 7:44 ` Tetsuo Handa
2025-08-08 16:33 ` Sebastian Andrzej Siewior
1 sibling, 1 reply; 9+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-26 6:36 UTC (permalink / raw)
To: Yunseong Kim
Cc: Dmitry Vyukov, Andrey Konovalov, Byungchul Park,
max.byungchul.park, Yeoreum Yun, Michelle Jin, linux-kernel,
Tetsuo Handa, Alan Stern, Thomas Gleixner,
Sebastian Andrzej Siewior, stable, kasan-dev, syzkaller,
linux-usb, linux-rt-devel
On Fri, Jul 25, 2025 at 08:14:01PM +0000, Yunseong Kim wrote:
> When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following
> bug is triggered in the ksoftirqd context.
>
> | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> | in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 30, name: ksoftirqd/1
> | preempt_count: 0, expected: 0
> | RCU nest depth: 2, expected: 2
> | CPU: 1 UID: 0 PID: 30 Comm: ksoftirqd/1 Tainted: G W 6.16.0-rc1-rt1 #11 PREEMPT_RT
> | Tainted: [W]=WARN
> | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
> | Call trace:
> | show_stack+0x2c/0x3c (C)
> | __dump_stack+0x30/0x40
> | dump_stack_lvl+0x148/0x1d8
> | dump_stack+0x1c/0x3c
> | __might_resched+0x2e4/0x52c
> | rt_spin_lock+0xa8/0x1bc
> | kcov_remote_start+0xb0/0x490
> | __usb_hcd_giveback_urb+0x2d0/0x5e8
> | usb_giveback_urb_bh+0x234/0x3c4
> | process_scheduled_works+0x678/0xd18
> | bh_worker+0x2f0/0x59c
> | workqueue_softirq_action+0x104/0x14c
> | tasklet_action+0x18/0x8c
> | handle_softirqs+0x208/0x63c
> | run_ksoftirqd+0x64/0x264
> | smpboot_thread_fn+0x4ac/0x908
> | kthread+0x5e8/0x734
> | ret_from_fork+0x10/0x20
Why is this only a USB thing? What is unique about it to trigger this
issue?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
2025-07-26 6:36 ` Greg Kroah-Hartman
@ 2025-07-26 7:44 ` Tetsuo Handa
2025-07-26 7:59 ` Greg Kroah-Hartman
0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2025-07-26 7:44 UTC (permalink / raw)
To: Greg Kroah-Hartman, Yunseong Kim
Cc: Dmitry Vyukov, Andrey Konovalov, Byungchul Park,
max.byungchul.park, Yeoreum Yun, Michelle Jin, linux-kernel,
Alan Stern, Thomas Gleixner, Sebastian Andrzej Siewior, stable,
kasan-dev, syzkaller, linux-usb, linux-rt-devel
On 2025/07/26 15:36, Greg Kroah-Hartman wrote:
> Why is this only a USB thing? What is unique about it to trigger this
> issue?
I couldn't catch your question. But the answer could be that
__usb_hcd_giveback_urb() is a function which is a USB thing
and
kcov_remote_start_usb_softirq() is calling local_irq_save() despite CONFIG_PREEMPT_RT=y
as shown below.
static void __usb_hcd_giveback_urb(struct urb *urb)
{
(...snipped...)
kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) {
if (in_serving_softirq()) {
local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y
kcov_remote_start_usb(id) {
kcov_remote_start(id) {
kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) {
(...snipped...)
local_lock_irqsave(&kcov_percpu_data.lock, flags) {
__local_lock_irqsave(lock, flags) {
#ifndef CONFIG_PREEMPT_RT
https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L125
#else
https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L235 // not calling local_irq_save(flags)
#endif
}
}
(...snipped...)
spin_lock(&kcov_remote_lock) {
#ifndef CONFIG_PREEMPT_RT
https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock.h#L351
#else
https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock_rt.h#L42 // mapped to rt_mutex which might sleep
#endif
}
(...snipped...)
}
}
}
}
}
(...snipped...)
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
2025-07-26 7:44 ` Tetsuo Handa
@ 2025-07-26 7:59 ` Greg Kroah-Hartman
2025-07-26 11:59 ` Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-26 7:59 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Yunseong Kim, Dmitry Vyukov, Andrey Konovalov, Byungchul Park,
max.byungchul.park, Yeoreum Yun, Michelle Jin, linux-kernel,
Alan Stern, Thomas Gleixner, Sebastian Andrzej Siewior, stable,
kasan-dev, syzkaller, linux-usb, linux-rt-devel
On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
> On 2025/07/26 15:36, Greg Kroah-Hartman wrote:
> > Why is this only a USB thing? What is unique about it to trigger this
> > issue?
>
> I couldn't catch your question. But the answer could be that
>
> __usb_hcd_giveback_urb() is a function which is a USB thing
>
> and
>
> kcov_remote_start_usb_softirq() is calling local_irq_save() despite CONFIG_PREEMPT_RT=y
>
> as shown below.
>
>
>
> static void __usb_hcd_giveback_urb(struct urb *urb)
> {
> (...snipped...)
> kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) {
> if (in_serving_softirq()) {
> local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y
> kcov_remote_start_usb(id) {
> kcov_remote_start(id) {
> kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) {
> (...snipped...)
> local_lock_irqsave(&kcov_percpu_data.lock, flags) {
> __local_lock_irqsave(lock, flags) {
> #ifndef CONFIG_PREEMPT_RT
> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L125
> #else
> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L235 // not calling local_irq_save(flags)
> #endif
> }
> }
> (...snipped...)
> spin_lock(&kcov_remote_lock) {
> #ifndef CONFIG_PREEMPT_RT
> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock.h#L351
> #else
> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/spinlock_rt.h#L42 // mapped to rt_mutex which might sleep
> #endif
> }
> (...snipped...)
> }
> }
> }
> }
> }
> (...snipped...)
> }
>
Ok, but then how does the big comment section for
kcov_remote_start_usb_softirq() work, where it explicitly states:
* 2. Disables interrupts for the duration of the coverage collection section.
* This allows avoiding nested remote coverage collection sections in the
* softirq context (a softirq might occur during the execution of a work in
* the BH workqueue, which runs with in_serving_softirq() > 0).
* For example, usb_giveback_urb_bh() runs in the BH workqueue with
* interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
* the middle of its remote coverage collection section, and the interrupt
* handler might invoke __usb_hcd_giveback_urb() again.
You are removing half of this function entirely, which feels very wrong
to me as any sort of solution, as you have just said that all of that
documentation entry is now not needed.
Are you sure this is ok?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
2025-07-26 7:59 ` Greg Kroah-Hartman
@ 2025-07-26 11:59 ` Thomas Gleixner
2025-08-01 22:06 ` Yunseong Kim
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2025-07-26 11:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Tetsuo Handa
Cc: Yunseong Kim, Dmitry Vyukov, Andrey Konovalov, Byungchul Park,
max.byungchul.park, Yeoreum Yun, Michelle Jin, linux-kernel,
Alan Stern, Sebastian Andrzej Siewior, stable, kasan-dev,
syzkaller, linux-usb, linux-rt-devel
On Sat, Jul 26 2025 at 09:59, Greg Kroah-Hartman wrote:
> On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
>> static void __usb_hcd_giveback_urb(struct urb *urb)
>> {
>> (...snipped...)
>> kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) {
>> if (in_serving_softirq()) {
>> local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y
>> kcov_remote_start_usb(id) {
>> kcov_remote_start(id) {
>> kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) {
>> (...snipped...)
>> local_lock_irqsave(&kcov_percpu_data.lock, flags) {
>> __local_lock_irqsave(lock, flags) {
>> #ifndef CONFIG_PREEMPT_RT
>> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L125
>> #else
>> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L235 // not calling local_irq_save(flags)
>> #endif
Right, it does not invoke local_irq_save(flags), but it takes the
underlying lock, which means it prevents reentrance.
> Ok, but then how does the big comment section for
> kcov_remote_start_usb_softirq() work, where it explicitly states:
>
> * 2. Disables interrupts for the duration of the coverage collection section.
> * This allows avoiding nested remote coverage collection sections in the
> * softirq context (a softirq might occur during the execution of a work in
> * the BH workqueue, which runs with in_serving_softirq() > 0).
> * For example, usb_giveback_urb_bh() runs in the BH workqueue with
> * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
> * the middle of its remote coverage collection section, and the interrupt
> * handler might invoke __usb_hcd_giveback_urb() again.
>
>
> You are removing half of this function entirely, which feels very wrong
> to me as any sort of solution, as you have just said that all of that
> documentation entry is now not needed.
I'm not so sure because kcov_percpu_data.lock is only held within
kcov_remote_start() and kcov_remote_stop(), but the above comment
suggests that the whole section needs to be serialized.
Though I'm not a KCOV wizard and might be completely wrong here.
If the whole section is required to be serialized, then this need
another local lock in kcov_percpu_data to work correctly on RT.
Thanks,
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
2025-07-26 11:59 ` Thomas Gleixner
@ 2025-08-01 22:06 ` Yunseong Kim
0 siblings, 0 replies; 9+ messages in thread
From: Yunseong Kim @ 2025-08-01 22:06 UTC (permalink / raw)
To: Thomas Gleixner, Greg Kroah-Hartman, Tetsuo Handa
Cc: Dmitry Vyukov, Andrey Konovalov, Byungchul Park,
max.byungchul.park, Yeoreum Yun, Michelle Jin, linux-kernel,
Alan Stern, Sebastian Andrzej Siewior, stable, kasan-dev,
syzkaller, linux-usb, linux-rt-devel
Huge thanks to everyone for the feedback!
While working on earlier patches, running syzkaller on PREEMPT_RT uncovered
numerous sleep-in-atomic-context bugs and other synchronization issues unique to
that environment. This highlighted the need to address these problems.
On 7/26/25 8:59 오후, Thomas Gleixner wrote:
> On Sat, Jul 26 2025 at 09:59, Greg Kroah-Hartman wrote:
>> On Sat, Jul 26, 2025 at 04:44:42PM +0900, Tetsuo Handa wrote:
>>> static void __usb_hcd_giveback_urb(struct urb *urb)
>>> {
>>> (...snipped...)
>>> kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum) {
>>> if (in_serving_softirq()) {
>>> local_irq_save(flags); // calling local_irq_save() is wrong if CONFIG_PREEMPT_RT=y
>>> kcov_remote_start_usb(id) {
>>> kcov_remote_start(id) {
>>> kcov_remote_start(kcov_remote_handle(KCOV_SUBSYSTEM_USB, id)) {
>>> (...snipped...)
>>> local_lock_irqsave(&kcov_percpu_data.lock, flags) {
>>> __local_lock_irqsave(lock, flags) {
>>> #ifndef CONFIG_PREEMPT_RT
>>> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L125
>>> #else
>>> https://elixir.bootlin.com/linux/v6.16-rc7/source/include/linux/local_lock_internal.h#L235 // not calling local_irq_save(flags)
>>> #endif
>
> Right, it does not invoke local_irq_save(flags), but it takes the
> underlying lock, which means it prevents reentrance.
>
>> Ok, but then how does the big comment section for
>> kcov_remote_start_usb_softirq() work, where it explicitly states:
>>
>> * 2. Disables interrupts for the duration of the coverage collection section.
>> * This allows avoiding nested remote coverage collection sections in the
>> * softirq context (a softirq might occur during the execution of a work in
>> * the BH workqueue, which runs with in_serving_softirq() > 0).
>> * For example, usb_giveback_urb_bh() runs in the BH workqueue with
>> * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
>> * the middle of its remote coverage collection section, and the interrupt
>> * handler might invoke __usb_hcd_giveback_urb() again.
>>
>>
>> You are removing half of this function entirely, which feels very wrong
>> to me as any sort of solution, as you have just said that all of that
>> documentation entry is now not needed.
>
> I'm not so sure because kcov_percpu_data.lock is only held within
> kcov_remote_start() and kcov_remote_stop(), but the above comment
> suggests that the whole section needs to be serialized.
>
> Though I'm not a KCOV wizard and might be completely wrong here.
>
> If the whole section is required to be serialized, then this need
> another local lock in kcov_percpu_data to work correctly on RT.
>
> Thanks,
>
> tglx
After receiving comments from maintainers, I realized that my initial patch set
wasn't heading in the right direction.
It seems that the following two patches conflict on PREEMPT_RT kernels:
1. kcov: replace local_irq_save() with a local_lock_t
Link: https://github.com/torvalds/linux/commit/d5d2c51f1e5f
2. kcov, usb: disable interrupts in kcov_remote_start_usb_softirq
Link: https://github.com/torvalds/linux/commit/f85d39dd7ed8
My current approach involves:
* Removing the existing 'kcov_percpu_data.lock'
* Converting 'kcov->lock' and 'kcov_remote_lock' to raw spinlocks
* Relocating the kmalloc call for kcov_remote_add() outside kcov_ioctl_locked(),
as GFP_ATOMIC allocations can potentially sleep under PREEMPT_RT.
: As expected from further testing, keeping the GFP_ATOMIC allocation inside
kcov_remote_add() still leads to sleep in atomic context.
This approach allows us to keep Andrey’s patch d5d2c51f1e5f while making
modifications as Sebastian suggested in his commit f85d39dd7ed8 message,
which I found particularly insightful and full of helpful hints.
The work I'm doing on PATCH v2 involves a number of changes, and I would truly
appreciate any critical feedback. I'm always happy to hear insights!
Best regards,
Yunseong Kim
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
2025-07-25 20:14 [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT Yunseong Kim
2025-07-26 6:36 ` Greg Kroah-Hartman
@ 2025-08-08 16:33 ` Sebastian Andrzej Siewior
2025-08-08 17:35 ` Yunseong Kim
1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-08 16:33 UTC (permalink / raw)
To: Yunseong Kim
Cc: Dmitry Vyukov, Andrey Konovalov, Byungchul Park,
max.byungchul.park, Yeoreum Yun, Michelle Jin, linux-kernel,
Tetsuo Handa, Alan Stern, Greg Kroah-Hartman, Thomas Gleixner,
stable, kasan-dev, syzkaller, linux-usb, linux-rt-devel
On 2025-07-25 20:14:01 [+0000], Yunseong Kim wrote:
> When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following
> bug is triggered in the ksoftirqd context.
>
…
> This issue was introduced by commit
> f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq").
>
> However, this creates a conflict on PREEMPT_RT kernels. The local_irq_save()
> call establishes an atomic context where sleeping is forbidden. Inside this
> context, kcov_remote_start() is called, which on PREEMPT_RT uses sleeping
> locks (spinlock_t and local_lock_t are mapped to rt_mutex). This results in
> a sleeping function called from invalid context.
>
> On PREEMPT_RT, interrupt handlers are threaded, so the re-entrancy scenario
> is already safely handled by the existing local_lock_t and the global
> kcov_remote_lock within kcov_remote_start(). Therefore, the outer
> local_irq_save() is not necessary.
>
> This preserves the intended re-entrancy protection for non-RT kernels while
> resolving the locking violation on PREEMPT_RT kernels.
>
> After making this modification and testing it, syzkaller fuzzing the
> PREEMPT_RT kernel is now running without stopping on latest announced
> Real-time Linux.
This looks oddly familiar because I removed the irq-disable bits while
adding local-locks.
Commit f85d39dd7ed8 looks wrong not that it shouldn't disable
interrupts. The statement in the added comment
| + * 2. Disables interrupts for the duration of the coverage collection section.
| + * This allows avoiding nested remote coverage collection sections in the
| + * softirq context (a softirq might occur during the execution of a work in
| + * the BH workqueue, which runs with in_serving_softirq() > 0).
is wrong. Softirqs are never nesting. While the BH workqueue is
running another softirq does not occur. The softirq is raised (again)
and will be handled _after_ BH workqueue is done. So this is already
serialised.
The issue is __usb_hcd_giveback_urb() always invokes
kcov_remote_start_usb_softirq(). __usb_hcd_giveback_urb() itself is
invoked from BH context (for the majority of HCDs) and from hardirq
context for the root-HUB. This gets us to the scenario that that we are
in the give-back path in softirq context and then invoke the function
once again in hardirq context.
I have no idea how kcov works but reverting the original commit and
avoiding the false nesting due to hardirq context should do the trick,
an untested patch follows.
This isn't any different than the tasklet handling that was used before
so I am not sure why it is now a problem.
Could someone maybe test this?
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1636,7 +1636,6 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus);
struct usb_anchor *anchor = urb->anchor;
int status = urb->unlinked;
- unsigned long flags;
urb->hcpriv = NULL;
if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) &&
@@ -1654,14 +1653,13 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
/* pass ownership to the completion handler */
urb->status = status;
/*
- * Only collect coverage in the softirq context and disable interrupts
- * to avoid scenarios with nested remote coverage collection sections
- * that KCOV does not support.
- * See the comment next to kcov_remote_start_usb_softirq() for details.
+ * This function can be called in task context inside another remote
+ * coverage collection section, but kcov doesn't support that kind of
+ * recursion yet. Only collect coverage in softirq context for now.
*/
- flags = kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum);
+ kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum);
urb->complete(urb);
- kcov_remote_stop_softirq(flags);
+ kcov_remote_stop_softirq();
usb_anchor_resume_wakeups(anchor);
atomic_dec(&urb->use_count);
diff --git a/include/linux/kcov.h b/include/linux/kcov.h
index 75a2fb8b16c32..0143358874b07 100644
--- a/include/linux/kcov.h
+++ b/include/linux/kcov.h
@@ -57,47 +57,21 @@ static inline void kcov_remote_start_usb(u64 id)
/*
* The softirq flavor of kcov_remote_*() functions is introduced as a temporary
- * workaround for KCOV's lack of nested remote coverage sections support.
- *
- * Adding support is tracked in https://bugzilla.kernel.org/show_bug.cgi?id=210337.
- *
- * kcov_remote_start_usb_softirq():
- *
- * 1. Only collects coverage when called in the softirq context. This allows
- * avoiding nested remote coverage collection sections in the task context.
- * For example, USB/IP calls usb_hcd_giveback_urb() in the task context
- * within an existing remote coverage collection section. Thus, KCOV should
- * not attempt to start collecting coverage within the coverage collection
- * section in __usb_hcd_giveback_urb() in this case.
- *
- * 2. Disables interrupts for the duration of the coverage collection section.
- * This allows avoiding nested remote coverage collection sections in the
- * softirq context (a softirq might occur during the execution of a work in
- * the BH workqueue, which runs with in_serving_softirq() > 0).
- * For example, usb_giveback_urb_bh() runs in the BH workqueue with
- * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
- * the middle of its remote coverage collection section, and the interrupt
- * handler might invoke __usb_hcd_giveback_urb() again.
+ * work around for kcov's lack of nested remote coverage sections support in
+ * task context. Adding support for nested sections is tracked in:
+ * https://bugzilla.kernel.org/show_bug.cgi?id=210337
*/
-static inline unsigned long kcov_remote_start_usb_softirq(u64 id)
+static inline void kcov_remote_start_usb_softirq(u64 id)
{
- unsigned long flags = 0;
-
- if (in_serving_softirq()) {
- local_irq_save(flags);
+ if (in_serving_softirq() && !in_hardirq())
kcov_remote_start_usb(id);
- }
-
- return flags;
}
-static inline void kcov_remote_stop_softirq(unsigned long flags)
+static inline void kcov_remote_stop_softirq(void)
{
- if (in_serving_softirq()) {
+ if (in_serving_softirq() && !in_hardirq())
kcov_remote_stop();
- local_irq_restore(flags);
- }
}
#ifdef CONFIG_64BIT
@@ -131,11 +105,8 @@ static inline u64 kcov_common_handle(void)
}
static inline void kcov_remote_start_common(u64 id) {}
static inline void kcov_remote_start_usb(u64 id) {}
-static inline unsigned long kcov_remote_start_usb_softirq(u64 id)
-{
- return 0;
-}
-static inline void kcov_remote_stop_softirq(unsigned long flags) {}
+static inline void kcov_remote_start_usb_softirq(u64 id) {}
+static inline void kcov_remote_stop_softirq(void) {}
#endif /* CONFIG_KCOV */
#endif /* _LINUX_KCOV_H */
--
2.50.1
Sebastian
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
2025-08-08 16:33 ` Sebastian Andrzej Siewior
@ 2025-08-08 17:35 ` Yunseong Kim
2025-08-11 8:31 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 9+ messages in thread
From: Yunseong Kim @ 2025-08-08 17:35 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Dmitry Vyukov, Andrey Konovalov, Byungchul Park,
max.byungchul.park, Yeoreum Yun, Michelle Jin, linux-kernel,
Tetsuo Handa, Alan Stern, Greg Kroah-Hartman, Thomas Gleixner,
stable, kasan-dev, syzkaller, linux-usb, linux-rt-devel,
Austin Kim
Hi Sebastian,
I was waiting for your review — thanks!
On 8/9/25 1:33 오전, Sebastian Andrzej Siewior wrote:
> On 2025-07-25 20:14:01 [+0000], Yunseong Kim wrote:
>> When fuzzing USB with syzkaller on a PREEMPT_RT enabled kernel, following
>> bug is triggered in the ksoftirqd context.
>>
> …
>> This issue was introduced by commit
>> f85d39dd7ed8 ("kcov, usb: disable interrupts in kcov_remote_start_usb_softirq").
>>
>> However, this creates a conflict on PREEMPT_RT kernels. The local_irq_save()
>> call establishes an atomic context where sleeping is forbidden. Inside this
>> context, kcov_remote_start() is called, which on PREEMPT_RT uses sleeping
>> locks (spinlock_t and local_lock_t are mapped to rt_mutex). This results in
>> a sleeping function called from invalid context.
>>
>> On PREEMPT_RT, interrupt handlers are threaded, so the re-entrancy scenario
>> is already safely handled by the existing local_lock_t and the global
>> kcov_remote_lock within kcov_remote_start(). Therefore, the outer
>> local_irq_save() is not necessary.
>>
>> This preserves the intended re-entrancy protection for non-RT kernels while
>> resolving the locking violation on PREEMPT_RT kernels.
>>
>> After making this modification and testing it, syzkaller fuzzing the
>> PREEMPT_RT kernel is now running without stopping on latest announced
>> Real-time Linux.
>
> This looks oddly familiar because I removed the irq-disable bits while
> adding local-locks.
>
> Commit f85d39dd7ed8 looks wrong not that it shouldn't disable
> interrupts. The statement in the added comment
>
> | + * 2. Disables interrupts for the duration of the coverage collection section.
> | + * This allows avoiding nested remote coverage collection sections in the
> | + * softirq context (a softirq might occur during the execution of a work in
> | + * the BH workqueue, which runs with in_serving_softirq() > 0).
>
> is wrong. Softirqs are never nesting. While the BH workqueue is
> running another softirq does not occur. The softirq is raised (again)
> and will be handled _after_ BH workqueue is done. So this is already
> serialised.
>
> The issue is __usb_hcd_giveback_urb() always invokes
> kcov_remote_start_usb_softirq(). __usb_hcd_giveback_urb() itself is
> invoked from BH context (for the majority of HCDs) and from hardirq
> context for the root-HUB. This gets us to the scenario that that we are
> in the give-back path in softirq context and then invoke the function
> once again in hardirq context.
>
> I have no idea how kcov works but reverting the original commit and
> avoiding the false nesting due to hardirq context should do the trick,
> an untested patch follows.
>
> This isn't any different than the tasklet handling that was used before
> so I am not sure why it is now a problem.
Thank you for the detailed analysis and the patch. Your explanation about
the real re-entrancy issue being "softirq vs. hardirq" and the faulty
premise in the original commit makes perfect sense.
> Could someone maybe test this?
As you requested, I have tested your patch on my setup.
I can check that your patch resolves the issue. I have been running
the syzkaller for several hours, and the "sleeping function called
from invalid context" bug is no longer triggered.
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -1636,7 +1636,6 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
> struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus);
> struct usb_anchor *anchor = urb->anchor;
> int status = urb->unlinked;
> - unsigned long flags;
>
> urb->hcpriv = NULL;
> if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) &&
> @@ -1654,14 +1653,13 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
> /* pass ownership to the completion handler */
> urb->status = status;
> /*
> - * Only collect coverage in the softirq context and disable interrupts
> - * to avoid scenarios with nested remote coverage collection sections
> - * that KCOV does not support.
> - * See the comment next to kcov_remote_start_usb_softirq() for details.
> + * This function can be called in task context inside another remote
> + * coverage collection section, but kcov doesn't support that kind of
> + * recursion yet. Only collect coverage in softirq context for now.
> */
> - flags = kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum);
> + kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum);
> urb->complete(urb);
> - kcov_remote_stop_softirq(flags);
> + kcov_remote_stop_softirq();
>
> usb_anchor_resume_wakeups(anchor);
> atomic_dec(&urb->use_count);
> diff --git a/include/linux/kcov.h b/include/linux/kcov.h
> index 75a2fb8b16c32..0143358874b07 100644
> --- a/include/linux/kcov.h
> +++ b/include/linux/kcov.h
> @@ -57,47 +57,21 @@ static inline void kcov_remote_start_usb(u64 id)
>
> /*
> * The softirq flavor of kcov_remote_*() functions is introduced as a temporary
> - * workaround for KCOV's lack of nested remote coverage sections support.
> - *
> - * Adding support is tracked in https://bugzilla.kernel.org/show_bug.cgi?id=210337.
> - *
> - * kcov_remote_start_usb_softirq():
> - *
> - * 1. Only collects coverage when called in the softirq context. This allows
> - * avoiding nested remote coverage collection sections in the task context.
> - * For example, USB/IP calls usb_hcd_giveback_urb() in the task context
> - * within an existing remote coverage collection section. Thus, KCOV should
> - * not attempt to start collecting coverage within the coverage collection
> - * section in __usb_hcd_giveback_urb() in this case.
> - *
> - * 2. Disables interrupts for the duration of the coverage collection section.
> - * This allows avoiding nested remote coverage collection sections in the
> - * softirq context (a softirq might occur during the execution of a work in
> - * the BH workqueue, which runs with in_serving_softirq() > 0).
> - * For example, usb_giveback_urb_bh() runs in the BH workqueue with
> - * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in
> - * the middle of its remote coverage collection section, and the interrupt
> - * handler might invoke __usb_hcd_giveback_urb() again.
> + * work around for kcov's lack of nested remote coverage sections support in
> + * task context. Adding support for nested sections is tracked in:
> + * https://bugzilla.kernel.org/show_bug.cgi?id=210337
> */
>
> -static inline unsigned long kcov_remote_start_usb_softirq(u64 id)
> +static inline void kcov_remote_start_usb_softirq(u64 id)
> {
> - unsigned long flags = 0;
> -
> - if (in_serving_softirq()) {
> - local_irq_save(flags);
> + if (in_serving_softirq() && !in_hardirq())
> kcov_remote_start_usb(id);
> - }
> -
> - return flags;
> }
>
> -static inline void kcov_remote_stop_softirq(unsigned long flags)
> +static inline void kcov_remote_stop_softirq(void)
> {
> - if (in_serving_softirq()) {
> + if (in_serving_softirq() && !in_hardirq())
> kcov_remote_stop();
> - local_irq_restore(flags);
> - }
> }
>
> #ifdef CONFIG_64BIT
> @@ -131,11 +105,8 @@ static inline u64 kcov_common_handle(void)
> }
> static inline void kcov_remote_start_common(u64 id) {}
> static inline void kcov_remote_start_usb(u64 id) {}
> -static inline unsigned long kcov_remote_start_usb_softirq(u64 id)
> -{
> - return 0;
> -}
> -static inline void kcov_remote_stop_softirq(unsigned long flags) {}
> +static inline void kcov_remote_start_usb_softirq(u64 id) {}
> +static inline void kcov_remote_stop_softirq(void) {}
>
> #endif /* CONFIG_KCOV */
> #endif /* _LINUX_KCOV_H */
I really impressed your "How to Not Break PREEMPT_RT" talk at LPC 22.
Tested-by: Yunseong Kim <ysk@kzalloc.com>
Thanks,
Yunseong Kim
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
2025-08-08 17:35 ` Yunseong Kim
@ 2025-08-11 8:31 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-11 8:31 UTC (permalink / raw)
To: Yunseong Kim
Cc: Dmitry Vyukov, Andrey Konovalov, Byungchul Park,
max.byungchul.park, Yeoreum Yun, Michelle Jin, linux-kernel,
Tetsuo Handa, Alan Stern, Greg Kroah-Hartman, Thomas Gleixner,
stable, kasan-dev, syzkaller, linux-usb, linux-rt-devel,
Austin Kim
On 2025-08-09 02:35:48 [+0900], Yunseong Kim wrote:
> Hi Sebastian,
Hi Yunseong,
> > Could someone maybe test this?
>
> As you requested, I have tested your patch on my setup.
>
> I can check that your patch resolves the issue. I have been running
> the syzkaller for several hours, and the "sleeping function called
> from invalid context" bug is no longer triggered.
Thank you. I just sent this as a proper patch assuming kcov still does
what it should. I just don't understand why this triggers after moving
to workqueues and did not with the tasklet setup. Other that than
workqueue code has a bit more overhead, it is the same thing.
> I really impressed your "How to Not Break PREEMPT_RT" talk at LPC 22.
Thank you.
>
> Tested-by: Yunseong Kim <ysk@kzalloc.com>
>
>
> Thanks,
>
> Yunseong Kim
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-08-11 8:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-25 20:14 [PATCH] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT Yunseong Kim
2025-07-26 6:36 ` Greg Kroah-Hartman
2025-07-26 7:44 ` Tetsuo Handa
2025-07-26 7:59 ` Greg Kroah-Hartman
2025-07-26 11:59 ` Thomas Gleixner
2025-08-01 22:06 ` Yunseong Kim
2025-08-08 16:33 ` Sebastian Andrzej Siewior
2025-08-08 17:35 ` Yunseong Kim
2025-08-11 8:31 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).