From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>,
Shuah Khan <shuah@kernel.org>,
Network Development <netdev@vger.kernel.org>,
bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v6 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run()
Date: Fri, 07 Jan 2022 16:54:12 +0100 [thread overview]
Message-ID: <87a6g7a6e3.fsf@toke.dk> (raw)
In-Reply-To: <CAADnVQJS-2VdpkPoiXWCDYLV1MC6gk9oQFC+GZYw6jP2umH0Cw@mail.gmail.com>
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Thu, Jan 6, 2022 at 11:59 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> +
>> +#define NUM_PKTS 10
>
> I'm afraid this needs more work.
> Just bumping above to 1M I got:
> [ 254.165911] ================================
> [ 254.166387] WARNING: inconsistent lock state
> [ 254.166882] 5.16.0-rc7-02011-g64923127f1b3 #3784 Tainted: G O
> [ 254.167659] --------------------------------
> [ 254.168140] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> [ 254.168793] swapper/7/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
> [ 254.169373] ffff888113d24220 (&r->producer_lock){+.?.}-{2:2}, at:
> veth_xmit+0x361/0x830
> [ 254.170317] {SOFTIRQ-ON-W} state was registered at:
> [ 254.170921] lock_acquire+0x18a/0x450
> [ 254.171371] _raw_spin_lock+0x2f/0x40
> [ 254.171815] veth_xdp_xmit+0x1d7/0x8c0
> [ 254.172241] veth_ndo_xdp_xmit+0x1d/0x50
> [ 254.172689] bq_xmit_all+0x562/0xc30
> [ 254.173159] __dev_flush+0xb1/0x220
> [ 254.173586] xdp_do_flush+0xa/0x20
> [ 254.173983] xdp_test_run_batch.constprop.25+0x90c/0xf00
> [ 254.174564] bpf_test_run_xdp_live+0x369/0x480
> [ 254.175038] bpf_prog_test_run_xdp+0x63f/0xe50
> [ 254.175512] __sys_bpf+0x688/0x4410
> [ 254.175923] __x64_sys_bpf+0x75/0xb0
> [ 254.176327] do_syscall_64+0x34/0x80
> [ 254.176733] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 254.177297] irq event stamp: 130862
> [ 254.177681] hardirqs last enabled at (130862):
> [<ffffffff812d0812>] call_rcu+0x2a2/0x640
> [ 254.178561] hardirqs last disabled at (130861):
> [<ffffffff812d08bd>] call_rcu+0x34d/0x640
> [ 254.179404] softirqs last enabled at (130814):
> [<ffffffff83c00534>] __do_softirq+0x534/0x835
> [ 254.180332] softirqs last disabled at (130839):
> [<ffffffff811389f7>] irq_exit_rcu+0xe7/0x120
> [ 254.181255]
> [ 254.181255] other info that might help us debug this:
> [ 254.181969] Possible unsafe locking scenario:
> [ 254.183172] lock(&r->producer_lock);
> [ 254.183590] <Interrupt>
> [ 254.183893] lock(&r->producer_lock);
> [ 254.184321]
> [ 254.184321] *** DEADLOCK ***
> [ 254.184321]
> [ 254.185047] 5 locks held by swapper/7/0:
> [ 254.185501] #0: ffff8881f6d89db8 ((&ndev->rs_timer)){+.-.}-{0:0},
> at: call_timer_fn+0xc8/0x440
> [ 254.186496] #1: ffffffff854415e0 (rcu_read_lock){....}-{1:2}, at:
> ndisc_send_skb+0x761/0x12e0
> [ 254.187444] #2: ffffffff85441580 (rcu_read_lock_bh){....}-{1:2},
> at: ip6_finish_output2+0x2da/0x1e00
> [ 254.188447] #3: ffffffff85441580 (rcu_read_lock_bh){....}-{1:2},
> at: __dev_queue_xmit+0x1de/0x2910
> [ 254.189502] #4: ffffffff854415e0 (rcu_read_lock){....}-{1:2}, at:
> veth_xmit+0x41/0x830
> [ 254.190455]
> [ 254.190455] stack backtrace:
> [ 254.190963] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G O
> 5.16.0-rc7-02011-g64923127f1b3 #3784
> [ 254.192109] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> [ 254.193427] Call Trace:
> [ 254.193711] <IRQ>
> [ 254.193945] dump_stack_lvl+0x44/0x57
> [ 254.194418] mark_lock.part.54+0x157b/0x2210
> [ 254.194940] ? mark_lock.part.54+0xfd/0x2210
> [ 254.195451] ? print_usage_bug+0x80/0x80
> [ 254.195896] ? rcu_read_lock_sched_held+0x91/0xc0
> [ 254.196413] ? rcu_read_lock_bh_held+0xa0/0xa0
> [ 254.196903] ? rcu_read_lock_bh_held+0xa0/0xa0
> [ 254.197389] ? find_held_lock+0x33/0x1c0
> [ 254.197826] ? lock_release+0x3a1/0x650
> [ 254.198251] ? __stack_depot_save+0x274/0x490
> [ 254.198742] ? lock_acquire+0x19a/0x450
> [ 254.199175] ? lock_downgrade+0x690/0x690
> [ 254.199626] ? do_raw_spin_lock+0x11d/0x270
> [ 254.200091] ? rwlock_bug.part.2+0x90/0x90
> [ 254.200550] __lock_acquire+0x151f/0x6310
> [ 254.201000] ? mark_lock.part.54+0xfd/0x2210
> [ 254.201470] ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
> [ 254.202083] ? lock_is_held_type+0xda/0x130
> [ 254.202592] ? rcu_read_lock_sched_held+0x91/0xc0
> [ 254.203134] ? rcu_read_lock_bh_held+0xa0/0xa0
> [ 254.203630] lock_acquire+0x18a/0x450
> [ 254.204041] ? veth_xmit+0x361/0x830
> [ 254.204455] ? lock_release+0x650/0x650
> [ 254.204932] ? eth_gro_receive+0xc60/0xc60
> [ 254.205421] ? rcu_read_lock_held+0x91/0xa0
> [ 254.205912] _raw_spin_lock+0x2f/0x40
> [ 254.206314] ? veth_xmit+0x361/0x830
> [ 254.206707] veth_xmit+0x361/0x830
>
> I suspect it points out that local_bh_disable is needed
> around xdp_do_flush.
>
> That's why I asked you to test it with something
> more than 3 in NUM_PKTS.
> What values did you test it with? I hope not just 10.
>
> Please make sure XDP_PASS/TX/REDIRECT are all stress tested.
Okay, finally managed to reproduce this; it did not show up at all in my
own tests.
Did you run the old version of the selftest by any chance? At least I
can only reproduce it with the forwarding sysctl enabled; it happens
because the XDP_PASS path races with the XDP_REDIRECT path and end up
trying to grab the same lock, which only happens when the XDP_PASS path
sends the packets back out the same interface. The fix is to extend the
local_bh_disable() to cover the full loop in xdp_test_run_batch().
I'll send an update with that fixed. But I'm not sure what to do about
the selftest? I can keep the forwarding enabled + 1 million iterations -
that seems to trigger the bug fairly reliably for me, but it takes a bit
longer to run. Is that acceptable?
-Toke
next prev parent reply other threads:[~2022-01-07 15:54 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-06 19:59 [PATCH bpf-next v6 0/3] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
2022-01-06 19:59 ` [PATCH bpf-next v6 1/3] bpf: Add "live packet" mode for " Toke Høiland-Jørgensen
2022-01-06 19:59 ` [PATCH bpf-next v6 2/3] selftests/bpf: Move open_netns() and close_netns() into network_helpers.c Toke Høiland-Jørgensen
2022-01-06 19:59 ` [PATCH bpf-next v6 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Toke Høiland-Jørgensen
2022-01-07 0:50 ` Alexei Starovoitov
2022-01-07 15:54 ` Toke Høiland-Jørgensen [this message]
2022-01-07 16:05 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a6g7a6e3.fsf@toke.dk \
--to=toke@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=shuah@kernel.org \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.