Intel-Wired-Lan Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Daniel Xu <dxu@dxuuu.xyz>, Stanislav Fomichev <stfomichev@gmail.com>
Cc: mst@redhat.com, jasowang@redhat.com, ast@kernel.org,
	edumazet@google.com, anthony.l.nguyen@intel.com,
	"Jakub Sitnicki" <jakub@cloudflare.com>,
	daniel@iogearbox.net, kernel-team <kernel-team@cloudflare.com>,
	przemyslaw.kitszel@intel.com, john.fastabend@gmail.com,
	sdf@fomichev.me, intel-wired-lan@lists.osuosl.org,
	kuba@kernel.org, pabeni@redhat.com,
	"Lorenzo Bianconi" <lorenzo@kernel.org>,
	"Yan Zhai" <yan@cloudflare.com>,
	alexandre.torgue@foss.st.com,
	"Arthur Fabre" <afabre@cloudflare.com>,
	netdev@vger.kernel.org,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	tariqt@nvidia.com,
	"Alexander Lobakin" <aleksander.lobakin@intel.com>,
	mcoquelin.stm32@gmail.com, bpf@vger.kernel.org,
	saeedm@nvidia.com, davem@davemloft.net
Subject: Re: [Intel-wired-lan] [RFC bpf-next 0/4] Add XDP rx hw hints support performing XDP_REDIRECT
Date: Fri, 4 Oct 2024 12:38:27 +0200	[thread overview]
Message-ID: <038fffa3-1e29-4c6d-9e27-8181865dca46@kernel.org> (raw)
In-Reply-To: <2fy5vuewgwkh3o3mx5v4bkrzu6josqylraa4ocgzqib6a7ozt4@hwsuhcibtcb6>



On 04/10/2024 04.13, Daniel Xu wrote:
> On Thu, Oct 03, 2024 at 01:26:08PM GMT, Stanislav Fomichev wrote:
>> On 10/03, Arthur Fabre wrote:
>>> On Thu Oct 3, 2024 at 12:49 AM CEST, Stanislav Fomichev wrote:
>>>> On 10/02, Toke Høiland-Jørgensen wrote:
>>>>> Stanislav Fomichev <stfomichev@gmail.com> writes:
>>>>>
>>>>>> On 10/01, Toke Høiland-Jørgensen wrote:
>>>>>>> Lorenzo Bianconi <lorenzo@kernel.org> writes:
>>>>>>>
>>>>>>>>> On Mon Sep 30, 2024 at 1:49 PM CEST, Lorenzo Bianconi wrote:
>>>>>>>>>>> Lorenzo Bianconi <lorenzo@kernel.org> writes:
>>>>>>>>>>>
[...]
>>>>>>>>>>>>
>>>>>>>>>>>> I like this 'fast' KV approach but I guess we should really evaluate its
>>>>>>>>>>>> impact on performances (especially for xdp) since, based on the kfunc calls
>>>>>>>>>>>> order in the ebpf program, we can have one or multiple memmove/memcpy for
>>>>>>>>>>>> each packet, right?
>>>>>>>>>>>
>>>>>>>>>>> Yes, with Arthur's scheme, performance will be ordering dependent. Using

I really like the *compact* Key-Value (KV) store idea from Arthur.
  - The question is it is fast enough?

I've promised Arthur to XDP micro-benchmark this, if he codes this up to
be usable in the XDP code path.  Listening to the LPC recording I heard
that Alexei also saw potential and other use-case for this kind of
fast-and-compact KV approach.

I have high hopes for the performance, as Arthur uses POPCNT instruction
which is *very* fast[1]. I checked[2] AMD Zen 3 and 4 have Ops/Latency=1
and Reciprocal throughput 0.25.

  [1] https://www.agner.org/optimize/blog/read.php?i=853#848
  [2] https://www.agner.org/optimize/instruction_tables.pdf

[...]
>>> There are two different use-cases for the metadata:
>>>
>>> * "Hardware" metadata (like the hash, rx_timestamp...). There are only a
>>>    few well known fields, and only XDP can access them to set them as
>>>    metadata, so storing them in a struct somewhere could make sense.
>>>
>>> * Arbitrary metadata used by services. Eg a TC filter could set a field
>>>    describing which service a packet is for, and that could be reused for
>>>    iptables, routing, socket dispatch...
>>>    Similarly we could set a "packet_id" field that uniquely identifies a
>>>    packet so we can trace it throughout the network stack (through
>>>    clones, encap, decap, userspace services...).
>>>    The skb->mark, but with more room, and better support for sharing it.
>>>
>>> We can only know the layout ahead of time for the first one. And they're
>>> similar enough in their requirements (need to be stored somewhere in the
>>> SKB, have a way of retrieving each one individually, that it seems to
>>> make sense to use a common API).
>>
>> Why not have the following layout then?
>>
>> +---------------+-------------------+----------------------------------------+------+
>> | more headroom | user-defined meta | hw-meta (potentially fixed skb format) | data |
>> +---------------+-------------------+----------------------------------------+------+
>>                  ^                                                            ^
>>              data_meta                                                      data
>>
>> You obviously still have a problem of communicating the layout if you
>> have some redirects in between, but you, in theory still have this
>> problem with user-defined metadata anyway (unless I'm missing
>> something).
>>

Hmm, I think you are missing something... As far as I'm concerned we are
discussing placing the KV data after the xdp_frame, and not in the XDP
data_meta area (as your drawing suggests).  The xdp_frame is stored at
the very top of the headroom.  Lorenzo's patchset is extending struct
xdp_frame and now we are discussing to we can make a more flexible API
for extending this. I understand that Toke confirmed this here [3].  Let
me know if I missed something :-)

  [3] https://lore.kernel.org/all/874j62u1lb.fsf@toke.dk/

As part of designing this flexible API, we/Toke are trying hard not to
tie this to a specific data area.  This is a good API design, keeping it
flexible enough that we can move things around should the need arise.

I don't think it is viable to store this KV data in XDP data_meta area,
because existing BPF-prog's already have direct memory (write) access
and can change size of area, which creates too much headache with
(existing) BPF-progs creating unintentional breakage for the KV store,
which would then need extensive checks to handle random corruptions
(slowing down KV-store code).

--Jesper

  reply	other threads:[~2024-10-04 10:38 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-21 16:52 [Intel-wired-lan] [RFC bpf-next 0/4] Add XDP rx hw hints support performing XDP_REDIRECT Lorenzo Bianconi
2024-09-21 16:52 ` [Intel-wired-lan] [RFC bpf-next 1/4] net: xdp: Add xdp_rx_meta structure Lorenzo Bianconi
2024-09-21 16:52 ` [Intel-wired-lan] [RFC bpf-next 2/4] net: xdp: Update rx_hash of xdp_rx_meta struct running xmo_rx_hash callback Lorenzo Bianconi
2024-09-21 16:52 ` [Intel-wired-lan] [RFC bpf-next 3/4] net: xdp: Update rx_vlan of xdp_rx_meta struct running xmo_rx_vlan_tag callback Lorenzo Bianconi
2024-09-21 16:53 ` [Intel-wired-lan] [RFC bpf-next 4/4] net: xdp: Update rx timestamp of xdp_rx_meta struct running xmo_rx_timestamp callback Lorenzo Bianconi
2024-09-21 20:17 ` [Intel-wired-lan] [RFC bpf-next 0/4] Add XDP rx hw hints support performing XDP_REDIRECT Alexander Lobakin
2024-09-21 21:36   ` Jesper Dangaard Brouer
2024-09-22  9:17     ` Lorenzo Bianconi
2024-09-22 11:12       ` Toke Høiland-Jørgensen
2024-09-22 15:40         ` Lorenzo Bianconi
2024-09-26 10:54           ` Toke Høiland-Jørgensen
2024-09-26 14:57             ` Lorenzo Bianconi
2024-09-27  1:43               ` Stanislav Fomichev
2024-09-26 11:31         ` Arthur Fabre
2024-09-26 12:41           ` Toke Høiland-Jørgensen
2024-09-26 15:44             ` Arthur Fabre
2024-09-27 10:24               ` Toke Høiland-Jørgensen
2024-09-27 14:46                 ` Arthur Fabre
2024-09-27 15:06                   ` Lorenzo Bianconi
2024-09-30 10:58                     ` Toke Høiland-Jørgensen
2024-09-30 11:49                       ` Lorenzo Bianconi
2024-10-01 14:16                         ` Arthur Fabre
2024-10-01 14:54                           ` Lorenzo Bianconi
2024-10-01 15:14                             ` Toke Høiland-Jørgensen
2024-10-02 17:02                               ` Stanislav Fomichev
2024-10-02 18:38                                 ` Toke Høiland-Jørgensen
2024-10-02 22:49                                   ` Stanislav Fomichev
2024-10-03  6:35                                     ` Arthur Fabre
2024-10-03 20:26                                       ` Stanislav Fomichev
2024-10-04  2:13                                         ` Daniel Xu
2024-10-04 10:38                                           ` Jesper Dangaard Brouer [this message]
2024-10-04 13:55                                             ` Arthur Fabre
2024-10-04 14:14                                               ` Jesper Dangaard Brouer
2024-10-04 14:18                                                 ` Lorenzo Bianconi
2024-10-04 14:29                                                   ` Arthur Fabre
2024-10-04 17:53                                             ` Stanislav Fomichev
2024-10-06 10:27                                               ` Toke Høiland-Jørgensen
2024-10-07 18:48                                                 ` Stanislav Fomichev
2024-10-08  7:15                                                   ` Arthur Fabre
2024-10-04 16:27                                           ` Stanislav Fomichev
2024-09-30 10:52                   ` Toke Høiland-Jørgensen
2024-10-01 14:06                     ` Arthur Fabre
2024-10-01 15:28                       ` Toke Høiland-Jørgensen
2024-10-03  6:51                         ` Arthur Fabre
2024-09-22  9:08   ` Lorenzo Bianconi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=038fffa3-1e29-4c6d-9e27-8181865dca46@kernel.org \
    --to=hawk@kernel.org \
    --cc=afabre@cloudflare.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=alexandre.torgue@foss.st.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dxu@dxuuu.xyz \
    --cc=edumazet@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jakub@cloudflare.com \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=mcoquelin.stm32@gmail.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=saeedm@nvidia.com \
    --cc=sdf@fomichev.me \
    --cc=stfomichev@gmail.com \
    --cc=tariqt@nvidia.com \
    --cc=toke@redhat.com \
    --cc=yan@cloudflare.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox