BPF List
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <hawk@kernel.org>
To: "Daniel Borkmann" <daniel@iogearbox.net>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Edward Cree" <ecree.xilinx@gmail.com>
Cc: Yan Zhai <yan@cloudflare.com>,
	Stanislav Fomichev <sdf@google.com>,
	Netdev <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	kernel-team <kernel-team@cloudflare.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Sitnicki <jakub@cloudflare.com>
Subject: Re: Does skb_metadata_differs really need to stop GRO aggregation?
Date: Thu, 30 Nov 2023 21:35:01 +0100	[thread overview]
Message-ID: <e3402045-a36f-461f-8eab-bbc51735492d@kernel.org> (raw)
In-Reply-To: <8677db3e-5662-7ebe-5af0-e5a3ca60587f@iogearbox.net>



On 11/30/23 17:32, Daniel Borkmann wrote:
> On 11/30/23 2:55 PM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes
>>> On 11/29/23 10:52 PM, Toke Høiland-Jørgensen wrote:
>>>> Edward Cree <ecree.xilinx@gmail.com> writes:
>>>>> On 28/11/2023 14:39, Toke Høiland-Jørgensen wrote:
>>>>>> I'm not quite sure what should be the semantics of that, though. 
>>>>>> I.e.,
>>>>>> if you are trying to aggregate two packets that have the flag set, 
>>>>>> which
>>>>>> packet do you take the value from? What if only one packet has the 
>>>>>> flag
>>>
>>> It would probably make sense if both packets have it set.
>>
>> Right, so "aggregate only if both packets have the flag set, keeping the
>> metadata area from the first packet", then?
> 
> Yes, sgtm.
> 

Seems like a good default behavior: "keeping the metadata area from the 
first packet".
(Please object if someone sees a issue for their use-case with this 
default.)


>>>>>> set? Or should we instead have a "metadata_xdp_only" flag that just
>>>>>> prevents the skb metadata field from being set entirely?
>>>
>>> What would be the use case compared to resetting meta data right before
>>> we return with XDP_PASS?
>>
>> I was thinking it could save a call to xdp_adjust_meta() to reset it
>> back to zero before PASSing the packet. But okay, that may be of
>> marginal utility.
> 
> Agree, feels too marginal.
>

I should explain our use-case(s) a bit more.
We do want the information to survive XDP_PASS into the SKB.
Its the hole point, as we want to transfer information from XDP layer to
TC-layer and perhaps further all the way to BPF socket filters (I even
heard someone asked for).

I'm trying to get an overview, as I now have multiple product teams that
want to store information across/into differ layer, and they have other
teams that consume this information.

We are exploring more options than only XDP metadata area to store
information.  I have suggested that once an SKB have a socket
associated, then we can switch into using BPF local socket storage
tricks. (The lifetime of XDP metadata is not 100% clear as e.g.
pskb_expand_head clears it via skb_metadata_clear).
All ideas are welcome, e.g. I'm also looking at ability to store
auxiliary/metadata data associated with a dst_entry. And SKB->mark is
already used for other use-cases and isn't big enough. (and then there
is fun crossing a netns boundry).

Let me explain *one* of the concrete use-cases.  As described in [1],
the CF XDP L4 load-balancer Unimog have been extended to a product
called Plurimog that does load-balancing across data-centers "colo's".
When Plurimog redirects to another colo, the original "landing" colo's
ID is carried across (in some encap header) to a Unimog instance.  Thus,
the original landing Colo ID is known to Unimog running in another colo,
but that header is popped, so this info need to be transferred somehow.
I'm told that even the webserver/Nginx need to know the orig/foreign
landing colo ID (here there should be socket associated). For TCP SYN
packets, the layered DOS protecting also need to know foreign landing
colo ID. Other teams/products needs this for accounting, e.g. Traffic
Manager[1], Radar[2] and Capacity planning.


  [1] https://blog.cloudflare.com/meet-traffic-manager/
  [2] https://radar.cloudflare.com/



>>>>> Sounds like what's actually needed is bpf progs inside the GRO engine
>>>>>    to implement the metadata "protocol" prepare and coalesce 
>>>>> callbacks?
>>>>
>>>> Hmm, yes, I guess that would be the most general solution :)
>>>
>>> Feels like a potential good fit, agree, although for just solving the
>>> above sth not requiring extra BPF might be nice as well.
>>
>> Yeah, I agree that just the flag makes sense on its own.

I've mentioned before (e.g. at NetConf) I would really like to see BPF
progs inside the GRO engine, but that is a larger project on its own.
I think it is worth doing eventually, but I likely need a solution to
unblock the "tracing"/debugging use-case, where someone added a
timestamp to XDP metadata and discovered GRO was not working.

I guess, we can do the Plurimog use-case now, as it should be stable for
packets belonging to the same (GRO) flow.

--Jesper

  reply	other threads:[~2023-11-30 20:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-28 12:37 Does skb_metadata_differs really need to stop GRO aggregation? Jesper Dangaard Brouer
2023-11-28 13:06 ` Jesper Dangaard Brouer
2023-11-28 13:30   ` Daniel Borkmann
2023-11-28 14:39     ` Toke Høiland-Jørgensen
2023-11-29 18:04       ` Edward Cree
2023-11-29 21:52         ` Toke Høiland-Jørgensen
2023-11-29 23:10           ` Daniel Borkmann
2023-11-30 13:55             ` Toke Høiland-Jørgensen
2023-11-30 16:32               ` Daniel Borkmann
2023-11-30 20:35                 ` Jesper Dangaard Brouer [this message]
2023-11-30 22:00                   ` Martin KaFai Lau
2023-12-01  6:20                   ` Yan Zhai
2023-12-01 17:09                 ` Yan Zhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3402045-a36f-461f-8eab-bbc51735492d@kernel.org \
    --to=hawk@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=ecree.xilinx@gmail.com \
    --cc=edumazet@google.com \
    --cc=jakub@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@google.com \
    --cc=toke@redhat.com \
    --cc=yan@cloudflare.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox