* [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
@ 2024-02-01 17:53 Eric Dumazet
2024-02-01 18:49 ` Paolo Abeni
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Eric Dumazet @ 2024-02-01 17:53 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet, syzbot, Jiri Pirko
Many syzbot reports include the following trace [1]
If nsim_dev_trap_report_work() can not grab the mutex,
it should rearm itself at least one jiffie later.
[1]
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Workqueue: events nsim_dev_trap_report_work
RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
instrument_atomic_read include/linux/instrumented.h:68 [inline]
atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
_raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
debug_object_activate+0x349/0x540 lib/debugobjects.c:726
debug_work_activate kernel/workqueue.c:578 [inline]
insert_work+0x30/0x230 kernel/workqueue.c:1650
__queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
__queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
queue_delayed_work include/linux/workqueue.h:563 [inline]
schedule_delayed_work include/linux/workqueue.h:677 [inline]
nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Fixes: 012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@nvidia.com>
---
drivers/net/netdevsim/dev.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index b4d3b9cde8bd685202f135cf9c845d1be76ef428..92a7a36b93ac0cc1b02a551b974fb390254ac484 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -835,14 +835,14 @@ static void nsim_dev_trap_report_work(struct work_struct *work)
trap_report_dw.work);
nsim_dev = nsim_trap_data->nsim_dev;
- /* For each running port and enabled packet trap, generate a UDP
- * packet with a random 5-tuple and report it.
- */
if (!devl_trylock(priv_to_devlink(nsim_dev))) {
- schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 0);
+ schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 1);
return;
}
+ /* For each running port and enabled packet trap, generate a UDP
+ * packet with a random 5-tuple and report it.
+ */
list_for_each_entry(nsim_dev_port, &nsim_dev->port_list, list) {
if (!netif_running(nsim_dev_port->ns->netdev))
continue;
--
2.43.0.429.g432eaa2c6b-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
2024-02-01 17:53 [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work() Eric Dumazet
@ 2024-02-01 18:49 ` Paolo Abeni
2024-02-01 20:10 ` Eric Dumazet
2024-02-02 9:08 ` Jiri Pirko
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2024-02-01 18:49 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski
Cc: netdev, eric.dumazet, syzbot, Jiri Pirko
On Thu, 2024-02-01 at 17:53 +0000, Eric Dumazet wrote:
> Many syzbot reports include the following trace [1]
>
> If nsim_dev_trap_report_work() can not grab the mutex,
> it should rearm itself at least one jiffie later.
>
> [1]
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
> Workqueue: events nsim_dev_trap_report_work
> RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
> RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
> RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
> RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
> RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
> RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
> Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
> RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
> RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
> RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
> RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
> R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
> R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
> FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <NMI>
> </NMI>
> <TASK>
> instrument_atomic_read include/linux/instrumented.h:68 [inline]
> atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
> queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
> debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
> do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
> __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
> _raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
> debug_object_activate+0x349/0x540 lib/debugobjects.c:726
> debug_work_activate kernel/workqueue.c:578 [inline]
> insert_work+0x30/0x230 kernel/workqueue.c:1650
> __queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
> __queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
> queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
> queue_delayed_work include/linux/workqueue.h:563 [inline]
> schedule_delayed_work include/linux/workqueue.h:677 [inline]
> nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
> process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
> process_scheduled_works kernel/workqueue.c:2706 [inline]
> worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
> kthread+0x2c6/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> </TASK>
>
> Fixes: 012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini")
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jiri Pirko <jiri@nvidia.com>
> ---
> drivers/net/netdevsim/dev.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> index b4d3b9cde8bd685202f135cf9c845d1be76ef428..92a7a36b93ac0cc1b02a551b974fb390254ac484 100644
> --- a/drivers/net/netdevsim/dev.c
> +++ b/drivers/net/netdevsim/dev.c
> @@ -835,14 +835,14 @@ static void nsim_dev_trap_report_work(struct work_struct *work)
> trap_report_dw.work);
> nsim_dev = nsim_trap_data->nsim_dev;
>
> - /* For each running port and enabled packet trap, generate a UDP
> - * packet with a random 5-tuple and report it.
> - */
> if (!devl_trylock(priv_to_devlink(nsim_dev))) {
> - schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 0);
> + schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 1);
The patch LGTM, thanks!
I'm wondering if we have a similar problem in
devlink_rel_nested_in_notify_work():
if (!devl_trylock(devlink)) {
devlink_put(devlink);
goto reschedule_work;
}
//...
reschedule_work:
schedule_work(&rel->nested_in.notify_work);
And possibly adding 1ms delay there could be problematic?
Cheers,
Paolo
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
2024-02-01 18:49 ` Paolo Abeni
@ 2024-02-01 20:10 ` Eric Dumazet
2024-02-01 21:41 ` Jakub Kicinski
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2024-02-01 20:10 UTC (permalink / raw)
To: Paolo Abeni
Cc: David S . Miller, Jakub Kicinski, netdev, eric.dumazet, syzbot,
Jiri Pirko
On Thu, Feb 1, 2024 at 7:49 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> The patch LGTM, thanks!
>
> I'm wondering if we have a similar problem in
> devlink_rel_nested_in_notify_work():
>
> if (!devl_trylock(devlink)) {
> devlink_put(devlink);
> goto reschedule_work;
> }
>
> //...
> reschedule_work:
> schedule_work(&rel->nested_in.notify_work);
>
> And possibly adding 1ms delay there could be problematic?
A conversion to schedule_delayed_work() would be needed I think.
I looked at all syzbot reports and did not find
devlink_rel_nested_in_notify_work() in them,
I guess we were lucky all this time :)
>
> Cheers,
>
> Paolo
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
2024-02-01 20:10 ` Eric Dumazet
@ 2024-02-01 21:41 ` Jakub Kicinski
2024-02-02 8:39 ` Jiri Pirko
0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2024-02-01 21:41 UTC (permalink / raw)
To: Eric Dumazet
Cc: Paolo Abeni, David S . Miller, netdev, eric.dumazet, syzbot,
Jiri Pirko
On Thu, 1 Feb 2024 21:10:46 +0100 Eric Dumazet wrote:
> > And possibly adding 1ms delay there could be problematic?
>
> A conversion to schedule_delayed_work() would be needed I think.
>
> I looked at all syzbot reports and did not find
> devlink_rel_nested_in_notify_work() in them,
> I guess we were lucky all this time :)
FWIW the devlink_rel_* stuff is for linecards and SIOV sub function
instances, netdevsim can't fake those so syzbot probably never
exercises that code :(
Jiri is on CC, so we can consider him notified about the problem
and leave it to him? :)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
2024-02-01 21:41 ` Jakub Kicinski
@ 2024-02-02 8:39 ` Jiri Pirko
0 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2024-02-02 8:39 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Eric Dumazet, Paolo Abeni, David S . Miller, netdev, eric.dumazet,
syzbot, Jiri Pirko
Thu, Feb 01, 2024 at 10:41:08PM CET, kuba@kernel.org wrote:
>On Thu, 1 Feb 2024 21:10:46 +0100 Eric Dumazet wrote:
>> > And possibly adding 1ms delay there could be problematic?
>>
>> A conversion to schedule_delayed_work() would be needed I think.
>>
>> I looked at all syzbot reports and did not find
>> devlink_rel_nested_in_notify_work() in them,
>> I guess we were lucky all this time :)
>
>FWIW the devlink_rel_* stuff is for linecards and SIOV sub function
>instances, netdevsim can't fake those so syzbot probably never
>exercises that code :(
>
>Jiri is on CC, so we can consider him notified about the problem
>and leave it to him? :)
Will take care of that.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
2024-02-01 17:53 [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work() Eric Dumazet
2024-02-01 18:49 ` Paolo Abeni
@ 2024-02-02 9:08 ` Jiri Pirko
2024-02-02 12:04 ` Jiri Pirko
2024-02-02 19:10 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2024-02-02 9:08 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, syzbot, Jiri Pirko
Thu, Feb 01, 2024 at 06:53:24PM CET, edumazet@google.com wrote:
>Many syzbot reports include the following trace [1]
>
>If nsim_dev_trap_report_work() can not grab the mutex,
>it should rearm itself at least one jiffie later.
>
>[1]
>Sending NMI from CPU 1 to CPUs 0:
>NMI backtrace for cpu 0
>CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
>Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
>Workqueue: events nsim_dev_trap_report_work
> RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
> RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
> RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
> RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
> RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
> RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
>Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
>RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
>RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
>RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
>RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
>R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
>R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
>FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>Call Trace:
> <NMI>
> </NMI>
> <TASK>
> instrument_atomic_read include/linux/instrumented.h:68 [inline]
> atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
> queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
> debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
> do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
> __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
> _raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
> debug_object_activate+0x349/0x540 lib/debugobjects.c:726
> debug_work_activate kernel/workqueue.c:578 [inline]
> insert_work+0x30/0x230 kernel/workqueue.c:1650
> __queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
> __queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
> queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
> queue_delayed_work include/linux/workqueue.h:563 [inline]
> schedule_delayed_work include/linux/workqueue.h:677 [inline]
> nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
> process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
> process_scheduled_works kernel/workqueue.c:2706 [inline]
> worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
> kthread+0x2c6/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> </TASK>
>
>Fixes: 012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini")
>Reported-by: syzbot <syzkaller@googlegroups.com>
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
2024-02-01 17:53 [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work() Eric Dumazet
2024-02-01 18:49 ` Paolo Abeni
2024-02-02 9:08 ` Jiri Pirko
@ 2024-02-02 12:04 ` Jiri Pirko
2024-02-02 12:39 ` Eric Dumazet
2024-02-02 19:10 ` patchwork-bot+netdevbpf
3 siblings, 1 reply; 10+ messages in thread
From: Jiri Pirko @ 2024-02-02 12:04 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, syzbot, Jiri Pirko
Thu, Feb 01, 2024 at 06:53:24PM CET, edumazet@google.com wrote:
>Many syzbot reports include the following trace [1]
>
>If nsim_dev_trap_report_work() can not grab the mutex,
>it should rearm itself at least one jiffie later.
>
>[1]
>Sending NMI from CPU 1 to CPUs 0:
>NMI backtrace for cpu 0
>CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
>Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
>Workqueue: events nsim_dev_trap_report_work
> RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
> RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
> RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
> RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
> RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
> RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
>Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
>RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
>RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
>RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
>RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
>R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
>R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
>FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>Call Trace:
> <NMI>
> </NMI>
> <TASK>
> instrument_atomic_read include/linux/instrumented.h:68 [inline]
> atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
> queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
> debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
> do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
> __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
> _raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
> debug_object_activate+0x349/0x540 lib/debugobjects.c:726
> debug_work_activate kernel/workqueue.c:578 [inline]
> insert_work+0x30/0x230 kernel/workqueue.c:1650
> __queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
> __queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
> queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
> queue_delayed_work include/linux/workqueue.h:563 [inline]
> schedule_delayed_work include/linux/workqueue.h:677 [inline]
> nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
> process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
> process_scheduled_works kernel/workqueue.c:2706 [inline]
> worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
> kthread+0x2c6/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> </TASK>
What is actually the reason for this trace? I see that the RIP is on
"start" pointer access in kasan code when instrument_atomic_read()
is called on lock->val during spin_unlock checks. But why?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
2024-02-02 12:04 ` Jiri Pirko
@ 2024-02-02 12:39 ` Eric Dumazet
2024-02-02 12:56 ` Jiri Pirko
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2024-02-02 12:39 UTC (permalink / raw)
To: Jiri Pirko
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, syzbot, Jiri Pirko
On Fri, Feb 2, 2024 at 1:04 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Thu, Feb 01, 2024 at 06:53:24PM CET, edumazet@google.com wrote:
> >Many syzbot reports include the following trace [1]
> >
> >If nsim_dev_trap_report_work() can not grab the mutex,
> >it should rearm itself at least one jiffie later.
> >
> >[1]
> >Sending NMI from CPU 1 to CPUs 0:
> >NMI backtrace for cpu 0
> >CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
> >Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
> >Workqueue: events nsim_dev_trap_report_work
> > RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
> > RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
> > RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
> > RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
> > RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
> > RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
> >Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
> >RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
> >RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
> >RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
> >RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
> >R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
> >R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
> >FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
> >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >Call Trace:
> > <NMI>
> > </NMI>
> > <TASK>
> > instrument_atomic_read include/linux/instrumented.h:68 [inline]
> > atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
> > queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
> > debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
> > do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
> > __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
> > _raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
> > debug_object_activate+0x349/0x540 lib/debugobjects.c:726
> > debug_work_activate kernel/workqueue.c:578 [inline]
> > insert_work+0x30/0x230 kernel/workqueue.c:1650
> > __queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
> > __queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
> > queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
> > queue_delayed_work include/linux/workqueue.h:563 [inline]
> > schedule_delayed_work include/linux/workqueue.h:677 [inline]
> > nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
> > process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
> > process_scheduled_works kernel/workqueue.c:2706 [inline]
> > worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
> > kthread+0x2c6/0x3a0 kernel/kthread.c:388
> > ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> > ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> > </TASK>
>
> What is actually the reason for this trace? I see that the RIP is on
> "start" pointer access in kasan code when instrument_atomic_read()
> is called on lock->val during spin_unlock checks. But why?
This is a watchdog triggering because there are tasks stuck in D state.
NMI backtrace for cpu 1
CPU: 1 PID: 29 Comm: khungtaskd Not tainted
6.8.0-rc2-syzkaller-00031-g861c0981648f #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 11/17/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
nmi_cpu_backtrace+0x277/0x390 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x299/0x300 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
watchdog+0xf87/0x1210 kernel/hung_task.c:379
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Sending NMI from CPU 1 to CPUs 0:
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
2024-02-02 12:39 ` Eric Dumazet
@ 2024-02-02 12:56 ` Jiri Pirko
0 siblings, 0 replies; 10+ messages in thread
From: Jiri Pirko @ 2024-02-02 12:56 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, syzbot, Jiri Pirko
Fri, Feb 02, 2024 at 01:39:26PM CET, edumazet@google.com wrote:
>On Fri, Feb 2, 2024 at 1:04 PM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Thu, Feb 01, 2024 at 06:53:24PM CET, edumazet@google.com wrote:
>> >Many syzbot reports include the following trace [1]
>> >
>> >If nsim_dev_trap_report_work() can not grab the mutex,
>> >it should rearm itself at least one jiffie later.
>> >
>> >[1]
>> >Sending NMI from CPU 1 to CPUs 0:
>> >NMI backtrace for cpu 0
>> >CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
>> >Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
>> >Workqueue: events nsim_dev_trap_report_work
>> > RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
>> > RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
>> > RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
>> > RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
>> > RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
>> > RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
>> >Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
>> >RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
>> >RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
>> >RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
>> >RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
>> >R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
>> >R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
>> >FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
>> >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
>> >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> >Call Trace:
>> > <NMI>
>> > </NMI>
>> > <TASK>
>> > instrument_atomic_read include/linux/instrumented.h:68 [inline]
>> > atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
>> > queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
>> > debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
>> > do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
>> > __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
>> > _raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
>> > debug_object_activate+0x349/0x540 lib/debugobjects.c:726
>> > debug_work_activate kernel/workqueue.c:578 [inline]
>> > insert_work+0x30/0x230 kernel/workqueue.c:1650
>> > __queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
>> > __queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
>> > queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
>> > queue_delayed_work include/linux/workqueue.h:563 [inline]
>> > schedule_delayed_work include/linux/workqueue.h:677 [inline]
>> > nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
>> > process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
>> > process_scheduled_works kernel/workqueue.c:2706 [inline]
>> > worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
>> > kthread+0x2c6/0x3a0 kernel/kthread.c:388
>> > ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
>> > ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
>> > </TASK>
>>
>> What is actually the reason for this trace? I see that the RIP is on
>> "start" pointer access in kasan code when instrument_atomic_read()
>> is called on lock->val during spin_unlock checks. But why?
>
>This is a watchdog triggering because there are tasks stuck in D state.
>
> NMI backtrace for cpu 1
>CPU: 1 PID: 29 Comm: khungtaskd Not tainted
>6.8.0-rc2-syzkaller-00031-g861c0981648f #0
>Hardware name: Google Google Compute Engine/Google Compute Engine,
>BIOS Google 11/17/2023
>Call Trace:
><TASK>
>__dump_stack lib/dump_stack.c:88 [inline]
>dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
>nmi_cpu_backtrace+0x277/0x390 lib/nmi_backtrace.c:113
>nmi_trigger_cpumask_backtrace+0x299/0x300 lib/nmi_backtrace.c:62
>trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
>check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
>watchdog+0xf87/0x1210 kernel/hung_task.c:379
>kthread+0x2c6/0x3a0 kernel/kthread.c:388
>ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
>ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
></TASK>
>Sending NMI from CPU 1 to CPUs 0:
Ah, got it. Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
2024-02-01 17:53 [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work() Eric Dumazet
` (2 preceding siblings ...)
2024-02-02 12:04 ` Jiri Pirko
@ 2024-02-02 19:10 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-02-02 19:10 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, kuba, pabeni, netdev, eric.dumazet, syzkaller, jiri
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 1 Feb 2024 17:53:24 +0000 you wrote:
> Many syzbot reports include the following trace [1]
>
> If nsim_dev_trap_report_work() can not grab the mutex,
> it should rearm itself at least one jiffie later.
>
> [1]
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
> Workqueue: events nsim_dev_trap_report_work
> RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
> RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
> RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
> RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
> RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
> RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
> Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
> RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
> RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
> RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
> RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
> R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
> R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
> FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <NMI>
> </NMI>
> <TASK>
> instrument_atomic_read include/linux/instrumented.h:68 [inline]
> atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
> queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
> debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
> do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
> __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
> _raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
> debug_object_activate+0x349/0x540 lib/debugobjects.c:726
> debug_work_activate kernel/workqueue.c:578 [inline]
> insert_work+0x30/0x230 kernel/workqueue.c:1650
> __queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
> __queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
> queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
> queue_delayed_work include/linux/workqueue.h:563 [inline]
> schedule_delayed_work include/linux/workqueue.h:677 [inline]
> nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
> process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
> process_scheduled_works kernel/workqueue.c:2706 [inline]
> worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
> kthread+0x2c6/0x3a0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> </TASK>
>
> [...]
Here is the summary with links:
- [net] netdevsim: avoid potential loop in nsim_dev_trap_report_work()
https://git.kernel.org/netdev/net/c/ba5e1272142d
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-02-02 19:10 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-01 17:53 [PATCH net] netdevsim: avoid potential loop in nsim_dev_trap_report_work() Eric Dumazet
2024-02-01 18:49 ` Paolo Abeni
2024-02-01 20:10 ` Eric Dumazet
2024-02-01 21:41 ` Jakub Kicinski
2024-02-02 8:39 ` Jiri Pirko
2024-02-02 9:08 ` Jiri Pirko
2024-02-02 12:04 ` Jiri Pirko
2024-02-02 12:39 ` Eric Dumazet
2024-02-02 12:56 ` Jiri Pirko
2024-02-02 19:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).