* INFO: rcu detected stall in wg_packet_tx_worker @ 2020-04-26 17:38 syzbot 2020-04-26 17:57 ` syzbot 0 siblings, 1 reply; 11+ messages in thread From: syzbot @ 2020-04-26 17:38 UTC (permalink / raw) To: davem, jhs, jiri, kuba, linux-kernel, netdev, syzkaller-bugs, xiyou.wangcong Hello, syzbot found the following crash on: HEAD commit: b2768df2 Merge branch 'for-linus' of git://git.kernel.org/.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=16141e64100000 kernel config: https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68 dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a compiler: gcc (GCC) 9.0.0 20181231 (experimental) userspace arch: i386 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 1-...!: (10499 ticks this GP) idle=272/1/0x4000000000000002 softirq=53996/53996 fqs=254 (t=10500 jiffies g=80873 q=139) rcu: rcu_preempt kthread starved for 9955 jiffies! g80873 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 rcu: RCU grace-period kthread stack dump: rcu_preempt I28848 10 2 0x80004000 Call Trace: schedule+0xd0/0x2a0 kernel/sched/core.c:4158 schedule_timeout+0x35c/0x850 kernel/time/timer.c:1898 rcu_gp_fqs_loop kernel/rcu/tree.c:1674 [inline] rcu_gp_kthread+0x9bf/0x1960 kernel/rcu/tree.c:1836 kthread+0x388/0x470 kernel/kthread.c:268 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 NMI backtrace for cpu 1 CPU: 1 PID: 3483 Comm: kworker/1:8 Not tainted 5.7.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: wg-crypt-wg0 wg_packet_tx_worker Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101 nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x19b/0x1e5 kernel/rcu/tree_stall.h:254 print_cpu_stall kernel/rcu/tree_stall.h:475 [inline] check_cpu_stall kernel/rcu/tree_stall.h:549 [inline] rcu_pending kernel/rcu/tree.c:3225 [inline] rcu_sched_clock_irq.cold+0x55d/0xcfa kernel/rcu/tree.c:2296 update_process_times+0x25/0x60 kernel/time/timer.c:1727 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176 tick_sched_timer+0x4e/0x140 kernel/time/tick-sched.c:1320 __run_hrtimer kernel/time/hrtimer.c:1520 [inline] __hrtimer_run_queues+0x5ca/0xed0 kernel/time/hrtimer.c:1584 hrtimer_interrupt+0x312/0x770 kernel/time/hrtimer.c:1646 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113 [inline] smp_apic_timer_interrupt+0x15b/0x600 arch/x86/kernel/apic/apic.c:1138 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:sfq_dequeue+0x19f/0xcc0 net/sched/sch_sfq.c:492 Code: 66 44 03 a5 e2 02 00 00 66 44 89 63 1a 48 8b 9d 28 03 00 00 48 8d 7b 12 48 89 f8 48 c1 e8 03 42 0f b6 14 30 48 89 f8 83 e0 07 <83> c0 01 38 d0 7c 08 84 d2 0f 85 2b 0a 00 00 44 0f b7 63 12 48 8b RSP: 0018:ffffc9000b69f4e0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000002 RBX: ffff8880986c2370 RCX: ffffffff86466097 RDX: 0000000000000000 RSI: ffffffff86465fd2 RDI: ffff8880986c2382 RBP: ffff888097c2d000 R08: ffff88809b468240 R09: fffff520016d3e7a R10: 0000000000000003 R11: fffff520016d3e79 R12: 0000000000000000 R13: ffff8880986c238a R14: dffffc0000000000 R15: 0000000000000002 dequeue_skb net/sched/sch_generic.c:263 [inline] qdisc_restart net/sched/sch_generic.c:366 [inline] __qdisc_run+0x1ac/0x17b0 net/sched/sch_generic.c:384 __dev_xmit_skb net/core/dev.c:3704 [inline] __dev_queue_xmit+0x1d07/0x30a0 net/core/dev.c:4021 neigh_hh_output include/net/neighbour.h:499 [inline] neigh_output include/net/neighbour.h:508 [inline] ip6_finish_output2+0xfb5/0x25b0 net/ipv6/ip6_output.c:117 __ip6_finish_output+0x442/0xab0 net/ipv6/ip6_output.c:143 ip6_finish_output+0x34/0x1f0 net/ipv6/ip6_output.c:153 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x239/0x810 net/ipv6/ip6_output.c:176 dst_output include/net/dst.h:435 [inline] ip6_local_out+0xaf/0x1a0 net/ipv6/output_core.c:179 ip6tunnel_xmit include/net/ip6_tunnel.h:160 [inline] udp_tunnel6_xmit_skb+0x6f4/0xcd0 net/ipv6/ip6_udp_tunnel.c:114 send6+0x4e3/0xb20 drivers/net/wireguard/socket.c:164 wg_socket_send_skb_to_peer+0xf5/0x220 drivers/net/wireguard/socket.c:189 wg_packet_create_data_done drivers/net/wireguard/send.c:250 [inline] wg_packet_tx_worker+0x30c/0xc30 drivers/net/wireguard/send.c:278 process_one_work+0x965/0x16a0 kernel/workqueue.c:2268 worker_thread+0x96/0xe20 kernel/workqueue.c:2414 kthread+0x388/0x470 kernel/kthread.c:268 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. syzbot can test patches for this bug, for details see: https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: INFO: rcu detected stall in wg_packet_tx_worker 2020-04-26 17:38 INFO: rcu detected stall in wg_packet_tx_worker syzbot @ 2020-04-26 17:57 ` syzbot 2020-04-26 19:40 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: syzbot @ 2020-04-26 17:57 UTC (permalink / raw) To: Jason, davem, f.fainelli, gregkh, jason, jhs, jiri, krzk, kuba, kvalo, leon, linux-kernel, linux-kselftest, netdev, shuah, syzkaller-bugs, tglx, vivien.didelot, xiyou.wangcong syzbot has bisected this bug to: commit e7096c131e5161fa3b8e52a650d7719d2857adfd Author: Jason A. Donenfeld <Jason@zx2c4.com> Date: Sun Dec 8 23:27:34 2019 +0000 net: WireGuard secure network tunnel bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000 start commit: b2768df2 Merge branch 'for-linus' of git://git.kernel.org/.. git tree: upstream final crash: https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000 console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000 kernel config: https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68 dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a userspace arch: i386 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000 Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: INFO: rcu detected stall in wg_packet_tx_worker 2020-04-26 17:57 ` syzbot @ 2020-04-26 19:40 ` Eric Dumazet 2020-04-26 19:42 ` Jason A. Donenfeld 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2020-04-26 19:40 UTC (permalink / raw) To: syzbot, Jason, davem, f.fainelli, gregkh, jhs, jiri, krzk, kuba, kvalo, leon, linux-kernel, linux-kselftest, netdev, shuah, syzkaller-bugs, tglx, vivien.didelot, xiyou.wangcong On 4/26/20 10:57 AM, syzbot wrote: > syzbot has bisected this bug to: > > commit e7096c131e5161fa3b8e52a650d7719d2857adfd > Author: Jason A. Donenfeld <Jason@zx2c4.com> > Date: Sun Dec 8 23:27:34 2019 +0000 > > net: WireGuard secure network tunnel > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000 > start commit: b2768df2 Merge branch 'for-linus' of git://git.kernel.org/.. > git tree: upstream > final crash: https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000 > console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68 > dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a > userspace arch: i386 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000 > > Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com > Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > I have not looked at the repro closely, but WireGuard has some workers that might loop forever, cond_resched() might help a bit. diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c index da3b782ab7d31df11e381529b144bcc494234a38..349a71e1907e081c61967c77c9f25a6ec5e57a24 100644 --- a/drivers/net/wireguard/receive.c +++ b/drivers/net/wireguard/receive.c @@ -518,6 +518,7 @@ void wg_packet_decrypt_worker(struct work_struct *work) &PACKET_CB(skb)->keypair->receiving)) ? PACKET_STATE_CRYPTED : PACKET_STATE_DEAD; wg_queue_enqueue_per_peer_napi(skb, state); + cond_resched(); } } diff --git a/drivers/net/wireguard/send.c b/drivers/net/wireguard/send.c index 7348c10cbae3db54bfcb31f23c2753185735f876..f5b88693176c84b4bfdf8c4e05071481a3ce45b5 100644 --- a/drivers/net/wireguard/send.c +++ b/drivers/net/wireguard/send.c @@ -281,6 +281,7 @@ void wg_packet_tx_worker(struct work_struct *work) wg_noise_keypair_put(keypair, false); wg_peer_put(peer); + cond_resched(); } } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: INFO: rcu detected stall in wg_packet_tx_worker 2020-04-26 19:40 ` Eric Dumazet @ 2020-04-26 19:42 ` Jason A. Donenfeld 2020-04-26 19:52 ` Jason A. Donenfeld 2020-04-26 20:26 ` Eric Dumazet 0 siblings, 2 replies; 11+ messages in thread From: Jason A. Donenfeld @ 2020-04-26 19:42 UTC (permalink / raw) To: Eric Dumazet Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs, Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo, leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs, Thomas Gleixner, vivien.didelot, Cong Wang On Sun, Apr 26, 2020 at 1:40 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 4/26/20 10:57 AM, syzbot wrote: > > syzbot has bisected this bug to: > > > > commit e7096c131e5161fa3b8e52a650d7719d2857adfd > > Author: Jason A. Donenfeld <Jason@zx2c4.com> > > Date: Sun Dec 8 23:27:34 2019 +0000 > > > > net: WireGuard secure network tunnel > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000 > > start commit: b2768df2 Merge branch 'for-linus' of git://git.kernel.org/.. > > git tree: upstream > > final crash: https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000 > > console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68 > > dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a > > userspace arch: i386 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000 > > > > Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com > > Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") > > > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > > > > I have not looked at the repro closely, but WireGuard has some workers > that might loop forever, cond_resched() might help a bit. I'm working on this right now. Having a bit difficult of a time getting it to reproduce locally... The reports show the stall happening always at: static struct sk_buff * sfq_dequeue(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; sfq_index a, next_a; struct sfq_slot *slot; /* No active slots */ if (q->tail == NULL) return NULL; next_slot: a = q->tail->next; slot = &q->slots[a]; Which is kind of interesting, because it's not like that should block or anything, unless there's some kasan faulting happening. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: INFO: rcu detected stall in wg_packet_tx_worker 2020-04-26 19:42 ` Jason A. Donenfeld @ 2020-04-26 19:52 ` Jason A. Donenfeld 2020-04-26 19:58 ` Jason A. Donenfeld 2020-04-26 20:26 ` Eric Dumazet 1 sibling, 1 reply; 11+ messages in thread From: Jason A. Donenfeld @ 2020-04-26 19:52 UTC (permalink / raw) To: Eric Dumazet Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs, Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo, leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs, Thomas Gleixner, vivien.didelot, Cong Wang It looks like part of the issue might be that I call udp_tunnel6_xmit_skb while holding rcu_read_lock_bh, in drivers/net/wireguard/socket.c. But I think there's good reason to do so, and udp_tunnel6_xmit_skb should be rcu safe. In fact, every.single.other user of udp_tunnel6_xmit_skb in the kernel uses it with rcu locked. So, hm... ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: INFO: rcu detected stall in wg_packet_tx_worker 2020-04-26 19:52 ` Jason A. Donenfeld @ 2020-04-26 19:58 ` Jason A. Donenfeld 0 siblings, 0 replies; 11+ messages in thread From: Jason A. Donenfeld @ 2020-04-26 19:58 UTC (permalink / raw) To: Eric Dumazet Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs, Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo, leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs, Thomas Gleixner, vivien.didelot, Cong Wang On Sun, Apr 26, 2020 at 1:52 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > It looks like part of the issue might be that I call > udp_tunnel6_xmit_skb while holding rcu_read_lock_bh, in > drivers/net/wireguard/socket.c. But I think there's good reason to do > so, and udp_tunnel6_xmit_skb should be rcu safe. In fact, > every.single.other user of udp_tunnel6_xmit_skb in the kernel uses it > with rcu locked. So, hm... In the syzkaller log, it looks like several runs are hitting: run #0: crashed: INFO: rcu detected stall in netlink_sendmsg And other runs are hitting yet different functions. So actually, it's not clear that this is the fault of the call to udp_tunnel6_xmit_skb. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: INFO: rcu detected stall in wg_packet_tx_worker 2020-04-26 19:42 ` Jason A. Donenfeld 2020-04-26 19:52 ` Jason A. Donenfeld @ 2020-04-26 20:26 ` Eric Dumazet 2020-04-26 20:38 ` Eric Dumazet 1 sibling, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2020-04-26 20:26 UTC (permalink / raw) To: Jason A. Donenfeld, Eric Dumazet Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs, Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo, leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs, Thomas Gleixner, vivien.didelot, Cong Wang On 4/26/20 12:42 PM, Jason A. Donenfeld wrote: > On Sun, Apr 26, 2020 at 1:40 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: >> >> >> >> On 4/26/20 10:57 AM, syzbot wrote: >>> syzbot has bisected this bug to: >>> >>> commit e7096c131e5161fa3b8e52a650d7719d2857adfd >>> Author: Jason A. Donenfeld <Jason@zx2c4.com> >>> Date: Sun Dec 8 23:27:34 2019 +0000 >>> >>> net: WireGuard secure network tunnel >>> >>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000 >>> start commit: b2768df2 Merge branch 'for-linus' of git://git.kernel.org/.. >>> git tree: upstream >>> final crash: https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000 >>> console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68 >>> dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a >>> userspace arch: i386 >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000 >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000 >>> >>> Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com >>> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") >>> >>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection >>> >> >> I have not looked at the repro closely, but WireGuard has some workers >> that might loop forever, cond_resched() might help a bit. > > I'm working on this right now. Having a bit difficult of a time > getting it to reproduce locally... > > The reports show the stall happening always at: > > static struct sk_buff * > sfq_dequeue(struct Qdisc *sch) > { > struct sfq_sched_data *q = qdisc_priv(sch); > struct sk_buff *skb; > sfq_index a, next_a; > struct sfq_slot *slot; > > /* No active slots */ > if (q->tail == NULL) > return NULL; > > next_slot: > a = q->tail->next; > slot = &q->slots[a]; > > Which is kind of interesting, because it's not like that should block > or anything, unless there's some kasan faulting happening. > I am not really sure WireGuard is involved, the repro does not rely on it anyway. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: INFO: rcu detected stall in wg_packet_tx_worker 2020-04-26 20:26 ` Eric Dumazet @ 2020-04-26 20:38 ` Eric Dumazet 2020-04-26 20:46 ` Jason A. Donenfeld 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2020-04-26 20:38 UTC (permalink / raw) To: Jason A. Donenfeld Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs, Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo, leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs, Thomas Gleixner, vivien.didelot, Cong Wang On 4/26/20 1:26 PM, Eric Dumazet wrote: > > > On 4/26/20 12:42 PM, Jason A. Donenfeld wrote: >> On Sun, Apr 26, 2020 at 1:40 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: >>> >>> >>> >>> On 4/26/20 10:57 AM, syzbot wrote: >>>> syzbot has bisected this bug to: >>>> >>>> commit e7096c131e5161fa3b8e52a650d7719d2857adfd >>>> Author: Jason A. Donenfeld <Jason@zx2c4.com> >>>> Date: Sun Dec 8 23:27:34 2019 +0000 >>>> >>>> net: WireGuard secure network tunnel >>>> >>>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000 >>>> start commit: b2768df2 Merge branch 'for-linus' of git://git.kernel.org/.. >>>> git tree: upstream >>>> final crash: https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000 >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000 >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68 >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a >>>> userspace arch: i386 >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000 >>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000 >>>> >>>> Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com >>>> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") >>>> >>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection >>>> >>> >>> I have not looked at the repro closely, but WireGuard has some workers >>> that might loop forever, cond_resched() might help a bit. >> >> I'm working on this right now. Having a bit difficult of a time >> getting it to reproduce locally... >> >> The reports show the stall happening always at: >> >> static struct sk_buff * >> sfq_dequeue(struct Qdisc *sch) >> { >> struct sfq_sched_data *q = qdisc_priv(sch); >> struct sk_buff *skb; >> sfq_index a, next_a; >> struct sfq_slot *slot; >> >> /* No active slots */ >> if (q->tail == NULL) >> return NULL; >> >> next_slot: >> a = q->tail->next; >> slot = &q->slots[a]; >> >> Which is kind of interesting, because it's not like that should block >> or anything, unless there's some kasan faulting happening. >> > > I am not really sure WireGuard is involved, the repro does not rely on it anyway. > Yes, do not spend too much time on this. syzbot found its way into crazy qdisc settings these last days. ( I sent a patch yesterday for choke qdisc, it seems similar checks are needed in sfq ) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: INFO: rcu detected stall in wg_packet_tx_worker 2020-04-26 20:38 ` Eric Dumazet @ 2020-04-26 20:46 ` Jason A. Donenfeld 2020-04-26 21:53 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Jason A. Donenfeld @ 2020-04-26 20:46 UTC (permalink / raw) To: Eric Dumazet Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs, Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo, leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs, Thomas Gleixner, vivien.didelot, Cong Wang On Sun, Apr 26, 2020 at 2:38 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 4/26/20 1:26 PM, Eric Dumazet wrote: > > > > > > On 4/26/20 12:42 PM, Jason A. Donenfeld wrote: > >> On Sun, Apr 26, 2020 at 1:40 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > >>> > >>> > >>> > >>> On 4/26/20 10:57 AM, syzbot wrote: > >>>> syzbot has bisected this bug to: > >>>> > >>>> commit e7096c131e5161fa3b8e52a650d7719d2857adfd > >>>> Author: Jason A. Donenfeld <Jason@zx2c4.com> > >>>> Date: Sun Dec 8 23:27:34 2019 +0000 > >>>> > >>>> net: WireGuard secure network tunnel > >>>> > >>>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000 > >>>> start commit: b2768df2 Merge branch 'for-linus' of git://git.kernel.org/.. > >>>> git tree: upstream > >>>> final crash: https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000 > >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000 > >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68 > >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a > >>>> userspace arch: i386 > >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000 > >>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000 > >>>> > >>>> Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com > >>>> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") > >>>> > >>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection > >>>> > >>> > >>> I have not looked at the repro closely, but WireGuard has some workers > >>> that might loop forever, cond_resched() might help a bit. > >> > >> I'm working on this right now. Having a bit difficult of a time > >> getting it to reproduce locally... > >> > >> The reports show the stall happening always at: > >> > >> static struct sk_buff * > >> sfq_dequeue(struct Qdisc *sch) > >> { > >> struct sfq_sched_data *q = qdisc_priv(sch); > >> struct sk_buff *skb; > >> sfq_index a, next_a; > >> struct sfq_slot *slot; > >> > >> /* No active slots */ > >> if (q->tail == NULL) > >> return NULL; > >> > >> next_slot: > >> a = q->tail->next; > >> slot = &q->slots[a]; > >> > >> Which is kind of interesting, because it's not like that should block > >> or anything, unless there's some kasan faulting happening. > >> > > > > I am not really sure WireGuard is involved, the repro does not rely on it anyway. > > > > Yes, do not spend too much time on this. > > syzbot found its way into crazy qdisc settings these last days. > > ( I sent a patch yesterday for choke qdisc, it seems similar checks are needed in sfq ) Ah, whew, okay. I had just begun instrumenting sfq (the highly technical term for "adding printks everywhere") to figure out what's going on. Looks like you've got a handle on it, so I'll let you have at it. On the brighter side, it seems like Dmitry's and my effort to get full coverage of WireGuard has paid off in the sense that tons of packets wind up being shoveled through it in one way or another, which is good. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: INFO: rcu detected stall in wg_packet_tx_worker 2020-04-26 20:46 ` Jason A. Donenfeld @ 2020-04-26 21:53 ` Eric Dumazet 2020-05-04 11:23 ` Jason A. Donenfeld 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2020-04-26 21:53 UTC (permalink / raw) To: Jason A. Donenfeld, Eric Dumazet Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs, Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo, leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs, Thomas Gleixner, vivien.didelot, Cong Wang On 4/26/20 1:46 PM, Jason A. Donenfeld wrote: > > Ah, whew, okay. I had just begun instrumenting sfq (the highly > technical term for "adding printks everywhere") to figure out what's > going on. Looks like you've got a handle on it, so I'll let you have > at it. Yes, syzbot manages to put a zero in q->scaled_quantum I will send a fix. > > On the brighter side, it seems like Dmitry's and my effort to get full > coverage of WireGuard has paid off in the sense that tons of packets > wind up being shoveled through it in one way or another, which is > good. > Sure ! ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: INFO: rcu detected stall in wg_packet_tx_worker 2020-04-26 21:53 ` Eric Dumazet @ 2020-05-04 11:23 ` Jason A. Donenfeld 0 siblings, 0 replies; 11+ messages in thread From: Jason A. Donenfeld @ 2020-05-04 11:23 UTC (permalink / raw) To: Eric Dumazet Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, Jamal Hadi Salim, Jiří Pírko, Krzysztof Kozlowski, Jakub Kicinski, kvalo, leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs, Thomas Gleixner, vivien.didelot, Cong Wang So in spite of this Syzkaller bug being unrelated in the end, I've continued to think about the stacktrace a bit, and combined with some other [potentially false alarm] bug reports I'm trying to wrap my head around, I'm a bit a curious about ideal usage for the udp_tunnel API. All the uses I've seen in the kernel (including wireguard) follow this pattern: rcu_read_lock_bh(); sock = rcu_dereference(obj->sock); ... udp_tunnel_xmit_skb(..., sock, ...); rcu_read_unlock_bh(); udp_tunnel_xmit_skb calls iptunnel_xmit, which winds up in the usual ip_local_out path, which eventually winds up calling some other devices' ndo_xmit, or gets queued up in a qdisc. Calls to udp_tunnel_xmit_skb aren't exactly cheap. So I wonder: is holding the rcu lock for all that time really a good thing? A different pattern that avoids holding the rcu lock would be: rcu_read_lock_bh(); sock = rcu_dereference(obj->sock); sock_hold(sock); rcu_read_unlock_bh(); ... udp_tunnel_xmit_skb(..., sock, ...); sock_put(sock); This seems better, but I wonder if it has some drawbacks too. For example, sock_put has some comment that warns against incrementing it in response to forwarded packets. And if this isn't necessary to do, it's marginally more costly than the first pattern. Any opinions about this? Jason ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-05-04 11:23 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-04-26 17:38 INFO: rcu detected stall in wg_packet_tx_worker syzbot 2020-04-26 17:57 ` syzbot 2020-04-26 19:40 ` Eric Dumazet 2020-04-26 19:42 ` Jason A. Donenfeld 2020-04-26 19:52 ` Jason A. Donenfeld 2020-04-26 19:58 ` Jason A. Donenfeld 2020-04-26 20:26 ` Eric Dumazet 2020-04-26 20:38 ` Eric Dumazet 2020-04-26 20:46 ` Jason A. Donenfeld 2020-04-26 21:53 ` Eric Dumazet 2020-05-04 11:23 ` Jason A. Donenfeld
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).