From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: "Nelson, Shannon" <shannon.nelson@amd.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
David Ahern <dsahern@gmail.com>, Jakub Kicinski <kuba@kernel.org>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
Daniel Borkmann <daniel@iogearbox.net>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>
Subject: Re: BPF/XDP: kernel panic when removing an interface that is an xdp_redirect target
Date: Thu, 09 Nov 2023 00:10:59 +0100 [thread overview]
Message-ID: <871qczx2m4.fsf@toke.dk> (raw)
In-Reply-To: <fa95d5d0-35c0-497e-aea8-a35f9f6304f4@amd.com>
"Nelson, Shannon" <shannon.nelson@amd.com> writes:
> On 11/7/2023 7:31 AM, Toke Høiland-Jørgensen wrote:
>>
>> "Nelson, Shannon" <shannon.nelson@amd.com> writes:
>>
>>> While testing new code to support XDP in the ionic driver we found that
>>> we could panic the kernel by running a bind/unbind loop on the target
>>> interface of an xdp_redirect action. Obviously this is a stress test
>>> that is abusing the system, but it does point to a window of opportunity
>>> in bq_enqueue() and bq_xmit_all(). I believe that while the validity of
>>> the target interface has been checked in __xdp_enqueue(), the interface
>>> can be unbound by the time either bq_enqueue() or bq_xmit_all() tries to
>>> use the interface. There is no locking or reference taken on the
>>> interface to hold it in place before the target’s ndo_xdp_xmit() is called.
>>>
>>> Below is a stack trace that our tester captured while running our test
>>> code on a RHEL 9.2 kernel – yes, I know, unpublished driver code on a
>>> non-upstream kernel. But if you look at the current upstream code in
>>> kernel/bpf/devmap.c I think you can see what we ran into.
>>>
>>> Other than telling users to not abuse the system with a bind/unbind
>>> loop, is there something we can do to limit the potential pain here?
>>> Without knowing what interfaces might be targeted by the users’ XDP
>>> programs, is there a step the originating driver can do to take
>>> precautions? Did we simply miss a step in the driver, or is this an
>>> actual problem in the devmap code?
>>
>> Sounds like a driver bug :)
>
> Entirely possible, wouldn't be our first ... :-)
>
>>
>> The XDP redirect flow guarantees that all outstanding packets are
>> flushed within a single NAPI cycle, as documented here:
>> https://docs.kernel.org/bpf/redirect.html
>>
>> So basically, the driver should be doing a two-step teardown: remove
>> global visibility of the resource in question, wait for all concurrent
>> users to finish, and *then* free the data structure. This corresponds to
>> the usual RCU protection: resources should be kept alive until all
>> concurrent RCU critical sections have exited on all CPUs. So if your
>> driver is removing an interface's data structure without waiting for
>> concurrent NAPI cycles to finish, that's a bug in the driver.
>>
>> This kind of thing is what the synchronize_net() function is for; for a
>> usage example, see veth_napi_del_range(). My guess would be that you're
>> missing this as part of your driver teardown flow?
>
> Essentially, the first thing we do in the remove function is to call
> unregister_netdev(), which has synchronize_net() in the path, so I don't
> think this is missing from our scenario, but thanks for the hint, I'll
> keep this in mind. I do see there are a couple of net drivers that are
> more aggressive about calling it directly in some other parts of the
> logic - I don't think that has a bearing on this issue, but I'll keep it
> in mind.
Hmm, right, in fact unregister_netdev() has two such synchronize_net()
calls. The XDP queue is only guaranteed to be flushed after the second
one of those, though, and there's an 'ndo_uninit()' callback in-between
them. So I don't suppose your driver implements that ndo and does
something there that could cause the crash you're seeing?
Otherwise, the one thing I can think of is that maybe it can be related
to the fact that synchronize_net() turns into a
synchronize_rcu_expedited() if the rtnl lock is held (which it is in
this case if you're calling the parameter-less unregister_netdev()). I'm
not quite sure I grok the expedited wait thing, but it should be pretty
easy to check if this is the cause by making a change like the one below
and seeing if the issue goes away.
-Toke
diff --git a/net/core/dev.c b/net/core/dev.c
index e28a18e7069b..1a035a5f0b0e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10932,7 +10932,7 @@ void synchronize_net(void)
{
might_sleep();
if (rtnl_is_locked())
- synchronize_rcu_expedited();
+ synchronize_rcu();
else
synchronize_rcu();
}
next prev parent reply other threads:[~2023-11-08 23:11 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-06 23:03 BPF/XDP: kernel panic when removing an interface that is an xdp_redirect target Nelson, Shannon
2023-11-07 15:31 ` Toke Høiland-Jørgensen
2023-11-08 21:30 ` Nelson, Shannon
2023-11-08 23:10 ` Toke Høiland-Jørgensen [this message]
2023-11-09 1:52 ` Jakub Kicinski
2023-11-09 2:11 ` Nelson, Shannon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871qczx2m4.fsf@toke.dk \
--to=toke@redhat.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dsahern@gmail.com \
--cc=hawk@kernel.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=shannon.nelson@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox