* [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags
@ 2023-08-02 9:23 edward.cree
2023-08-02 10:22 ` Eric Dumazet
2023-08-04 14:36 ` Tyler Stachecki
0 siblings, 2 replies; 7+ messages in thread
From: edward.cree @ 2023-08-02 9:23 UTC (permalink / raw)
To: davem, kuba, edumazet, pabeni; +Cc: Edward Cree, netdev, Martin Habets
From: Edward Cree <ecree.xilinx@gmail.com>
Cited commit removed the check on the grounds that napi_gro_frags must
not be called by drivers if napi_get_frags failed. But skb can also
be NULL if napi_frags_skb fails to pull the ethernet header ("dropping
impossible skb" message). In this case return GRO_CONSUMED, as
otherwise continuing on would cause a NULL dereference panic in
dev_gro_receive().
Fixes: 1d11fa696733 ("net-gro: remove GRO_DROP")
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
---
An sfc customer has encountered this panic in the wild; we're still
investigating exactly how it happened (we have a reproducer) but it
seems wise to have the core handle this check rather than requiring
it in every driver.
---
net/core/gro.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/core/gro.c b/net/core/gro.c
index 0759277dc14e..0159972038da 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -731,6 +731,9 @@ gro_result_t napi_gro_frags(struct napi_struct *napi)
gro_result_t ret;
struct sk_buff *skb = napi_frags_skb(napi);
+ if (!skb)
+ return GRO_CONSUMED;
+
trace_napi_gro_frags_entry(skb);
ret = napi_frags_finish(napi, skb, dev_gro_receive(napi, skb));
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags
2023-08-02 9:23 [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags edward.cree
@ 2023-08-02 10:22 ` Eric Dumazet
2023-08-16 17:46 ` Edward Cree
2023-08-04 14:36 ` Tyler Stachecki
1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2023-08-02 10:22 UTC (permalink / raw)
To: edward.cree; +Cc: davem, kuba, pabeni, Edward Cree, netdev, Martin Habets
On Wed, Aug 2, 2023 at 11:42 AM <edward.cree@amd.com> wrote:
>
> From: Edward Cree <ecree.xilinx@gmail.com>
>
> Cited commit removed the check on the grounds that napi_gro_frags must
> not be called by drivers if napi_get_frags failed. But skb can also
> be NULL if napi_frags_skb fails to pull the ethernet header ("dropping
> impossible skb" message). In this case return GRO_CONSUMED, as
> otherwise continuing on would cause a NULL dereference panic in
> dev_gro_receive().
>
> Fixes: 1d11fa696733 ("net-gro: remove GRO_DROP")
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
> ---
> An sfc customer has encountered this panic in the wild; we're still
> investigating exactly how it happened (we have a reproducer) but it
> seems wise to have the core handle this check rather than requiring
> it in every driver.
An ethernet driver feeding non-ethernet packets to the upper stacks
seems weird to me,
but given napi_frags_skb() does output a warning, I would say this
patch would be acceptable until the real bug is fixed :/
Note that eth_type_trans() does not double-check that at least
ETH_HLEN bytes are present in skb->data
skb_pull_inline(skb, ETH_HLEN);
So eth_type_trans() would definitely crash.
Not sure why a napi_gro_frags() enabled driver would be allowed to
cook arbitrary packets with length < ETH_HLEN
Mixed feelings here, especially if for some reason the compiler would
not inline napi_frags_skb().
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags
2023-08-02 9:23 [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags edward.cree
2023-08-02 10:22 ` Eric Dumazet
@ 2023-08-04 14:36 ` Tyler Stachecki
2023-08-04 14:45 ` Eric Dumazet
1 sibling, 1 reply; 7+ messages in thread
From: Tyler Stachecki @ 2023-08-04 14:36 UTC (permalink / raw)
To: edward.cree
Cc: davem, kuba, edumazet, pabeni, Edward Cree, netdev, Martin Habets
On Wed, Aug 02, 2023 at 10:23:40AM +0100, edward.cree@amd.com wrote:
> From: Edward Cree <ecree.xilinx@gmail.com>
>
> Cited commit removed the check on the grounds that napi_gro_frags must
> not be called by drivers if napi_get_frags failed. But skb can also
> be NULL if napi_frags_skb fails to pull the ethernet header ("dropping
> impossible skb" message). In this case return GRO_CONSUMED, as
> otherwise continuing on would cause a NULL dereference panic in
> dev_gro_receive().
>
> Fixes: 1d11fa696733 ("net-gro: remove GRO_DROP")
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
> ---
> An sfc customer has encountered this panic in the wild; we're still
> investigating exactly how it happened (we have a reproducer) but it
> seems wise to have the core handle this check rather than requiring
> it in every driver.
> ---
> net/core/gro.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/core/gro.c b/net/core/gro.c
> index 0759277dc14e..0159972038da 100644
> --- a/net/core/gro.c
> +++ b/net/core/gro.c
> @@ -731,6 +731,9 @@ gro_result_t napi_gro_frags(struct napi_struct *napi)
> gro_result_t ret;
> struct sk_buff *skb = napi_frags_skb(napi);
>
> + if (!skb)
> + return GRO_CONSUMED;
> +
> trace_napi_gro_frags_entry(skb);
>
> ret = napi_frags_finish(napi, skb, dev_gro_receive(napi, skb));
Given that this case is rarely hit, and given that there are some performance
concerns raised wrt this change, it may be beneficial to hint that this
branch is unlikely.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags
2023-08-04 14:36 ` Tyler Stachecki
@ 2023-08-04 14:45 ` Eric Dumazet
2023-08-04 16:32 ` Tyler Stachecki
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2023-08-04 14:45 UTC (permalink / raw)
To: Tyler Stachecki
Cc: edward.cree, davem, kuba, pabeni, Edward Cree, netdev,
Martin Habets
On Fri, Aug 4, 2023 at 4:36 PM Tyler Stachecki
<stachecki.tyler@gmail.com> wrote:
>
> Given that this case is rarely hit, and given that there are some performance
> concerns raised wrt this change, it may be beneficial to hint that this
> branch is unlikely.
This is already done, a compiler should already infer this from this code:
if (unlikely(skb_gro_header_hard(skb, hlen))) {
eth = skb_gro_header_slow(skb, hlen, 0);
if (unlikely(!eth)) {
net_warn_ratelimited("%s: dropping impossible skb from %s\n",
__func__, napi->dev->name);
napi_reuse_skb(napi, skb);
return NULL;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags
2023-08-04 14:45 ` Eric Dumazet
@ 2023-08-04 16:32 ` Tyler Stachecki
0 siblings, 0 replies; 7+ messages in thread
From: Tyler Stachecki @ 2023-08-04 16:32 UTC (permalink / raw)
To: Eric Dumazet
Cc: edward.cree, davem, kuba, pabeni, Edward Cree, netdev,
Martin Habets
On Fri, Aug 04, 2023 at 04:45:13PM +0200, Eric Dumazet wrote:
> This is already done, a compiler should already infer this from this code:
>
> if (unlikely(skb_gro_header_hard(skb, hlen))) {
> eth = skb_gro_header_slow(skb, hlen, 0);
> if (unlikely(!eth)) {
> net_warn_ratelimited("%s: dropping impossible skb from %s\n",
> __func__, napi->dev->name);
> napi_reuse_skb(napi, skb);
> return NULL;
> }
It is a good point - though, FWIW, at least with clang-16 I am observing that
it does not leverage this fact (fully?). Mostly for my own curiosity, I took a
look...
In both cases, the generated code for the NULL check is emitted as a a forward
branch, meaning that (at least on x86), the branch direction hint is rendered
as desired.
However, without unlikely(...), the code for the taken branch is clumped
roughly after the inlined code for napi_frags_finish and before the (hot paths
of) napi_frags_skb.
With unlikely added, the code for the taken NULL check is seated right next to
the code generated for the unlikely paths in your comment. So, it does seem to
have an effect, however minor!
---
Anyways: perhaps the more important note here is that, at least with clang-16,
I can confirm that everything is still inlined with this change.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags
2023-08-02 10:22 ` Eric Dumazet
@ 2023-08-16 17:46 ` Edward Cree
2023-08-16 17:59 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Edward Cree @ 2023-08-16 17:46 UTC (permalink / raw)
To: Eric Dumazet, edward.cree; +Cc: davem, kuba, pabeni, netdev, Martin Habets
On 02/08/2023 11:22, Eric Dumazet wrote:
> On Wed, Aug 2, 2023 at 11:42 AM <edward.cree@amd.com> wrote:
>> An sfc customer has encountered this panic in the wild; we're still
>> investigating exactly how it happened (we have a reproducer) but it
>> seems wise to have the core handle this check rather than requiring
>> it in every driver.
>
> An ethernet driver feeding non-ethernet packets to the upper stacks
> seems weird to me,
...
> Not sure why a napi_gro_frags() enabled driver would be allowed to
> cook arbitrary packets with length < ETH_HLEN
...
> Mixed feelings here
Fwiw we now have some more understanding of what caused this: the
device produced an RX descriptor with a zero in its buffer length
field (there was actually data in the buffer, but a — we think —
firmware bug caused the length to be stored in the wrong place).
And the driver, blithely trusting the device, attached the RX page
to the skb from napi_get_frags, passing the buffer length it had
straight to skb_fill_page_desc without checking it. (After all,
the device had told us through RX event flags that the packet was
TCP, so it had to have seen a complete set of headers.)
This certainly can be fixed in the driver, by adding a length
check before calling napi_gro_frags(), but I see two reasons to
put it in the core instead.
1) The same issue is likely to be present in any napi_gro_frags()
driver; I've just looked at e1000 and mlx4 (as examples) and it
looks like they could both be susceptible if hw misbehaved in a
similar way.
2) The core can recycle the SKB, but drivers can't because
napi_reuse_skb is static. Instead, if they've already called
napi_get_frags when they discover the problem, they have to do
kfree_skb(skb);
napi->skb = NULL;
which is not pretty. Of course we could always export
napi_reuse_skb...
Another open question is whether, if we do put this in the core,
we should do the same for the other RX entry points
(napi_gro_receive, eth_type_trans) that could see something
similar.
Anyway, I'll post v2, with unlikely() per Tyler's analysis, and you
can ack or nack as the mood takes you.
-ed
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags
2023-08-16 17:46 ` Edward Cree
@ 2023-08-16 17:59 ` Eric Dumazet
0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2023-08-16 17:59 UTC (permalink / raw)
To: Edward Cree; +Cc: edward.cree, davem, kuba, pabeni, netdev, Martin Habets
On Wed, Aug 16, 2023 at 7:46 PM Edward Cree <ecree.xilinx@gmail.com> wrote:
>
> On 02/08/2023 11:22, Eric Dumazet wrote:
> > On Wed, Aug 2, 2023 at 11:42 AM <edward.cree@amd.com> wrote:
> >> An sfc customer has encountered this panic in the wild; we're still
> >> investigating exactly how it happened (we have a reproducer) but it
> >> seems wise to have the core handle this check rather than requiring
> >> it in every driver.
> >
> > An ethernet driver feeding non-ethernet packets to the upper stacks
> > seems weird to me,
> ...
> > Not sure why a napi_gro_frags() enabled driver would be allowed to
> > cook arbitrary packets with length < ETH_HLEN
> ...
> > Mixed feelings here
> Fwiw we now have some more understanding of what caused this: the
> device produced an RX descriptor with a zero in its buffer length
> field (there was actually data in the buffer, but a — we think —
> firmware bug caused the length to be stored in the wrong place).
> And the driver, blithely trusting the device, attached the RX page
> to the skb from napi_get_frags, passing the buffer length it had
> straight to skb_fill_page_desc without checking it. (After all,
> the device had told us through RX event flags that the packet was
> TCP, so it had to have seen a complete set of headers.)
> This certainly can be fixed in the driver, by adding a length
> check before calling napi_gro_frags(), but I see two reasons to
> put it in the core instead.
> 1) The same issue is likely to be present in any napi_gro_frags()
> driver; I've just looked at e1000 and mlx4 (as examples) and it
> looks like they could both be susceptible if hw misbehaved in a
> similar way.
Yet, it seems they do not misbehave ?
How XDP will react to such bug btw ?
I would vote to add the fix in a driver first.
If really the same disease seems to be common to more than one vendor,
perhaps we could harden core ?
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-08-16 17:59 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-02 9:23 [RFC PATCH net] net-gro: restore check for NULL skb in napi_gro_frags edward.cree
2023-08-02 10:22 ` Eric Dumazet
2023-08-16 17:46 ` Edward Cree
2023-08-16 17:59 ` Eric Dumazet
2023-08-04 14:36 ` Tyler Stachecki
2023-08-04 14:45 ` Eric Dumazet
2023-08-04 16:32 ` Tyler Stachecki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).