From: Daniel Borkmann <daniel@iogearbox.net>
To: Yan Zhai <yan@cloudflare.com>
Cc: "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
"Willem de Bruijn" <willemb@google.com>,
"Simon Horman" <horms@kernel.org>,
"Florian Westphal" <fw@strlen.de>,
"Mina Almasry" <almasrymina@google.com>,
"Abhishek Chauhan" <quic_abchauha@quicinc.com>,
"David Howells" <dhowells@redhat.com>,
"Alexander Lobakin" <aleksander.lobakin@intel.com>,
"David Ahern" <dsahern@kernel.org>,
"Richard Gobert" <richardbgobert@gmail.com>,
"Antoine Tenart" <atenart@kernel.org>,
"Felix Fietkau" <nbd@nbd.name>,
"Soheil Hassas Yeganeh" <soheil@google.com>,
"Pavel Begunkov" <asml.silence@gmail.com>,
"Lorenzo Bianconi" <lorenzo@kernel.org>,
"Thomas Weißschuh" <linux@weissschuh.net>,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [RFC net-next 1/9] skb: introduce gro_disabled bit
Date: Fri, 21 Jun 2024 18:15:01 +0200 [thread overview]
Message-ID: <e6553be1-4eaa-e90a-17f8-dece2bb95e7b@iogearbox.net> (raw)
In-Reply-To: <CAO3-PbrhnvmdYmQubNsTX3gX917o=Q+MBWTBkxUd=YWt4dNGuA@mail.gmail.com>
On 6/21/24 6:00 PM, Yan Zhai wrote:
> On Fri, Jun 21, 2024 at 8:13 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 6/21/24 2:15 PM, Willem de Bruijn wrote:
>>> Yan Zhai wrote:
>>>> Software GRO is currently controlled by a single switch, i.e.
>>>>
>>>> ethtool -K dev gro on|off
>>>>
>>>> However, this is not always desired. When GRO is enabled, even if the
>>>> kernel cannot GRO certain traffic, it has to run through the GRO receive
>>>> handlers with no benefit.
>>>>
>>>> There are also scenarios that turning off GRO is a requirement. For
>>>> example, our production environment has a scenario that a TC egress hook
>>>> may add multiple encapsulation headers to forwarded skbs for load
>>>> balancing and isolation purpose. The encapsulation is implemented via
>>>> BPF. But the problem arises then: there is no way to properly offload a
>>>> double-encapsulated packet, since skb only has network_header and
>>>> inner_network_header to track one layer of encapsulation, but not two.
>>>> On the other hand, not all the traffic through this device needs double
>>>> encapsulation. But we have to turn off GRO completely for any ingress
>>>> device as a result.
>>>>
>>>> Introduce a bit on skb so that GRO engine can be notified to skip GRO on
>>>> this skb, rather than having to be 0-or-1 for all traffic.
>>>>
>>>> Signed-off-by: Yan Zhai <yan@cloudflare.com>
>>>> ---
>>>> include/linux/netdevice.h | 9 +++++++--
>>>> include/linux/skbuff.h | 10 ++++++++++
>>>> net/Kconfig | 10 ++++++++++
>>>> net/core/gro.c | 2 +-
>>>> net/core/gro_cells.c | 2 +-
>>>> net/core/skbuff.c | 4 ++++
>>>> 6 files changed, 33 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index c83b390191d4..2ca0870b1221 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -2415,11 +2415,16 @@ struct net_device {
>>>> ((dev)->devlink_port = (port)); \
>>>> })
>>>>
>>>> -static inline bool netif_elide_gro(const struct net_device *dev)
>>>> +static inline bool netif_elide_gro(const struct sk_buff *skb)
>>>> {
>>>> - if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog)
>>>> + if (!(skb->dev->features & NETIF_F_GRO) || skb->dev->xdp_prog)
>>>> return true;
>>>> +
>>>> +#ifdef CONFIG_SKB_GRO_CONTROL
>>>> + return skb->gro_disabled;
>>>> +#else
>>>> return false;
>>>> +#endif
>>>
>>> Yet more branches in the hot path.
>>>
>>> Compile time configurability does not help, as that will be
>>> enabled by distros.
>>>
>>> For a fairly niche use case. Where functionality of GRO already
>>> works. So just a performance for a very rare case at the cost of a
>>> regression in the common case. A small regression perhaps, but death
>>> by a thousand cuts.
>>
>> Mentioning it here b/c it perhaps fits in this context, longer time ago
>> there was the idea mentioned to have BPF operating as GRO engine which
>> might also help to reduce attack surface by only having to handle packets
>> of interest for the concrete production use case. Perhaps here meta data
>> buffer could be used to pass a notification from XDP to exit early w/o
>> aggregation.
>
> Metadata is in fact one of our interests as well. We discussed using
> metadata instead of a skb bit to carry this information internally.
> Since metadata is opaque atm so it seems the only option is to have a
> GRO control hook before napi_gro_receive, and let BPF decide
> netif_receive_skb or napi_gro_receive (echo what Paolo said). With BPF
> it could indeed be more flexible, but the cons is that it could be
> even more slower than taking a bit on skb. I am actually open to
> either approach, as long as it gives us more control on when to enable
> GRO :)
Oh wait, one thing that just came to mind.. have you tried u64 per-CPU
counter map in XDP? For packets which should not be GRO-aggregated you
add count++ into the meta data area, and this forces GRO to not aggregate
since meta data that needs to be transported to tc BPF layer mismatches
(and therefore the contract/intent is that tc BPF needs to see the different
meta data passed to it).
Thanks,
Daniel
next prev parent reply other threads:[~2024-06-21 16:41 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1718919473.git.yan@cloudflare.com>
2024-06-20 22:19 ` [RFC net-next 1/9] skb: introduce gro_disabled bit Yan Zhai
2024-06-21 9:11 ` Alexander Lobakin
2024-06-21 15:40 ` Yan Zhai
2024-06-21 9:49 ` Paolo Abeni
2024-06-21 14:29 ` Yan Zhai
2024-06-21 9:57 ` Paolo Abeni
2024-06-21 15:17 ` Yan Zhai
2024-06-21 12:15 ` Willem de Bruijn
2024-06-21 12:47 ` Daniel Borkmann
2024-06-21 16:00 ` Yan Zhai
2024-06-21 16:15 ` Daniel Borkmann [this message]
2024-06-21 17:20 ` Yan Zhai
2024-06-23 8:23 ` Willem de Bruijn
2024-06-24 13:30 ` Daniel Borkmann
2024-06-24 17:49 ` Yan Zhai
2024-06-21 15:34 ` Yan Zhai
2024-06-23 8:27 ` Willem de Bruijn
2024-06-24 18:17 ` Yan Zhai
2024-06-30 13:40 ` Willem de Bruijn
2024-07-03 18:46 ` Yan Zhai
2024-06-20 22:19 ` [RFC net-next 2/9] xdp: add XDP_FLAGS_GRO_DISABLED flag Yan Zhai
2024-06-21 9:15 ` Alexander Lobakin
2024-06-21 16:12 ` Yan Zhai
2024-06-20 22:19 ` [RFC net-next 3/9] xdp: implement bpf_xdp_disable_gro kfunc Yan Zhai
2024-06-20 22:19 ` [RFC net-next 4/9] bnxt: apply XDP offloading fixup when building skb Yan Zhai
2024-06-20 22:19 ` [RFC net-next 5/9] ice: " Yan Zhai
2024-06-21 9:20 ` Alexander Lobakin
2024-06-21 16:05 ` Yan Zhai
2024-06-20 22:19 ` [RFC net-next 6/9] veth: " Yan Zhai
2024-06-20 22:19 ` [RFC net-next 7/9] mlx5: move xdp_buff scope one level up Jesper Dangaard Brouer
2024-06-20 22:19 ` [RFC net-next 8/9] mlx5: apply XDP offloading fixup when building skb Yan Zhai
2024-06-20 22:19 ` [RFC net-next 9/9] bpf: selftests: test disabling GRO by XDP Yan Zhai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e6553be1-4eaa-e90a-17f8-dece2bb95e7b@iogearbox.net \
--to=daniel@iogearbox.net \
--cc=aleksander.lobakin@intel.com \
--cc=almasrymina@google.com \
--cc=asml.silence@gmail.com \
--cc=ast@kernel.org \
--cc=atenart@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=fw@strlen.de \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@weissschuh.net \
--cc=lorenzo@kernel.org \
--cc=nbd@nbd.name \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=quic_abchauha@quicinc.com \
--cc=richardbgobert@gmail.com \
--cc=soheil@google.com \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yan@cloudflare.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox