netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC bpf-next 00/20] traits: Per packet metadata KV store
@ 2025-03-05 14:31 arthur
  2025-03-05 14:31 ` [PATCH RFC bpf-next 01/20] trait: limited KV store for packet metadata arthur
                   ` (19 more replies)
  0 siblings, 20 replies; 37+ messages in thread
From: arthur @ 2025-03-05 14:31 UTC (permalink / raw)
  To: netdev, bpf
  Cc: jakub, hawk, yan, jbrandeburg, thoiland, lbiancon, Arthur Fabre

Currently, the only way to attach information to a sk_buff that travels 
through the network stack is by using the mark field. This 32-bit field
is highly versatile - it can be read in firewall rules, drive routing 
decisions, and be accessed by BPF programs.

However, its limited capacity creates competition for bits, restricting 
its practical use.

To remedy this, we propose using part of the packet headroom to store 
metadata. This would allow:
- Tracing packets through the network stack and across the kernel-user
  space boundary, by assigning them a unique ID.
- Metadata-driven packet redirection, routing, and socket steering with
  early classification in XDP.
- Extracting information from encapsulation headers and sharing it with
  user space or vice versa.
- Exposing XDP RX Metadata, like the timestamp, to the rest of the 
  network stack.

We originally proposed extending XDP metadata - binary blob
storage also in the headroom - to expose it throughout the network 
stack. However based on feedback at LPC 2024 [1]:
- sharing a binary blob amongst different applications is hard.
- exposing a binary blob to userspace is awkward.
we've shifted to a limited KV store in the headroom.

To differentiate this from the overloaded "metadata" term, it's 
tentatively called "packet traits".

A get() / set() / delete() API is exposed to BPF to store and 
retrieve traits. 

Initial benchmarks in XDP are promising, with get() / set() comparable
to an indirect function call. See patch 6: "trait: Replace memmove calls
with inline move" for full results.

We imagine adding first class support for this in netfilter (setting 
/ checking traits in rules) and routing (selecting routing tables 
based on traits) in follow up work.
We also envisage a first class userspace API for storing and
retrieving traits in the future.

To co-exist with the existing XDP metadata area, traits are stored at
the start of the headroom:

| xdp_frame | traits | headroom | XDP metadata | data / packet |

Traits and XDP metadata are not allowed to overlap.

Like XDP metadata, this relies on there being sufficient headroom
available. Piggy backing on top of that work, traits are currently
only supported:
- On ingress.
- By NIC drivers that support XDP metadata.
- When an XDP program is attached.
This limits the applicability of traits. But future work 
guaranteeing sufficient headroom through other means should allow
these restrictions to be lifted.

There are still a number of open questions:
- What sizes of values should be allowed? See patch 1 "trait: limited KV
  store for packet metadata".
- How should we handle skb clones? See patch 16 "trait: Support sk_buffs".
- How should trait keys be allocated? See patch 18 "trait: registration
  API".
- How should traits work with GRO? Could an API let us specify policies 
  for how traits should be merged? See patch 18 "trait: registration
  API".

[1] https://lpc.events/event/18/contributions/1935/

Cc: jakub@cloudflare.com
Cc: hawk@kernel.org
Cc: yan@cloudflare.com
Cc: jbrandeburg@cloudflare.com
Cc: thoiland@redhat.com
Cc: lbiancon@redhat.com

To: netdev@vger.kernel.org
To: bpf@vger.kernel.org

Signed-off-by: Arthur Fabre <afabre@cloudflare.com>
---
Arthur Fabre (19):
      trait: limited KV store for packet metadata
      trait: XDP support
      trait: basic XDP selftest
      trait: basic XDP benchmark
      trait: Replace memcpy calls with inline copies
      trait: Replace memmove calls with inline move
      xdp: Track if metadata is supported in xdp_frame <> xdp_buff conversions
      trait: Propagate presence of traits to sk_buff
      bnxt: Propagate trait presence to skb
      ice: Propagate trait presence to skb
      veth: Propagate trait presence to skb
      virtio_net: Propagate trait presence to skb
      mlx5: Propagate trait presence to skb
      xdp generic: Propagate trait presence to skb
      trait: Support sk_buffs
      trait: Allow socket filters to access traits
      trait: registration API
      trait: Sync linux/bpf.h to tools/ for trait registration
      trait: register traits in benchmarks and tests

Jesper Dangaard Brouer (1):
      mlx5: move xdp_buff scope one level up

 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   4 +
 drivers/net/ethernet/intel/ice/ice_txrx.c          |   4 +
 drivers/net/ethernet/intel/ice/ice_xsk.c           |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.c    |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h    |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    | 114 ++++----
 drivers/net/veth.c                                 |   4 +
 drivers/net/virtio_net.c                           |   8 +-
 include/linux/bpf-netns.h                          |  12 +
 include/linux/skbuff.h                             |  33 ++-
 include/net/net_namespace.h                        |   6 +
 include/net/netns/trait.h                          |  22 ++
 include/net/trait.h                                | 288 +++++++++++++++++++++
 include/net/xdp.h                                  |  42 ++-
 include/uapi/linux/bpf.h                           |  26 ++
 kernel/bpf/net_namespace.c                         |  54 ++++
 kernel/bpf/syscall.c                               |  26 ++
 kernel/bpf/verifier.c                              |  39 ++-
 net/core/dev.c                                     |   1 +
 net/core/filter.c                                  |  43 ++-
 net/core/skbuff.c                                  |  25 +-
 net/core/xdp.c                                     |  50 ++++
 tools/include/uapi/linux/bpf.h                     |  26 ++
 tools/testing/selftests/bpf/Makefile               |   2 +
 tools/testing/selftests/bpf/bench.c                |  11 +
 tools/testing/selftests/bpf/bench.h                |   1 +
 .../selftests/bpf/benchs/bench_xdp_traits.c        | 191 ++++++++++++++
 .../testing/selftests/bpf/prog_tests/xdp_traits.c  |  51 ++++
 .../testing/selftests/bpf/progs/bench_xdp_traits.c | 131 ++++++++++
 .../testing/selftests/bpf/progs/test_xdp_traits.c  |  94 +++++++
 31 files changed, 1259 insertions(+), 69 deletions(-)
---
base-commit: 42ba8a49d085e0c2ad50fb9a8ec954c9762b6e01
change-id: 20250305-afabre-traits-010-rfc2-a8e4de0c490b

Best regards,
-- 
Arthur Fabre <afabre@cloudflare.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-03-10 22:15 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-05 14:31 [PATCH RFC bpf-next 00/20] traits: Per packet metadata KV store arthur
2025-03-05 14:31 ` [PATCH RFC bpf-next 01/20] trait: limited KV store for packet metadata arthur
2025-03-07  6:36   ` Alexei Starovoitov
2025-03-07 11:14     ` Arthur Fabre
2025-03-07 17:29       ` Alexei Starovoitov
2025-03-10 14:45         ` Arthur Fabre
2025-03-07 19:24   ` Jakub Sitnicki
2025-03-05 14:31 ` [PATCH RFC bpf-next 02/20] trait: XDP support arthur
2025-03-07 19:13   ` Lorenzo Bianconi
2025-03-10 15:50     ` Arthur Fabre
2025-03-05 14:32 ` [PATCH RFC bpf-next 03/20] trait: basic XDP selftest arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 04/20] trait: basic XDP benchmark arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 05/20] trait: Replace memcpy calls with inline copies arthur
2025-03-10 10:50   ` Lorenzo Bianconi
2025-03-10 15:52     ` Arthur Fabre
2025-03-10 22:15   ` David Laight
2025-03-05 14:32 ` [PATCH RFC bpf-next 06/20] trait: Replace memmove calls with inline move arthur
2025-03-06 10:14   ` Jesper Dangaard Brouer
2025-03-05 14:32 ` [PATCH RFC bpf-next 07/20] xdp: Track if metadata is supported in xdp_frame <> xdp_buff conversions arthur
2025-03-05 15:24   ` Alexander Lobakin
2025-03-05 17:02     ` Arthur Fabre
2025-03-06 11:12       ` Jesper Dangaard Brouer
2025-03-10 11:10         ` Lorenzo Bianconi
2025-03-05 14:32 ` [PATCH RFC bpf-next 08/20] trait: Propagate presence of traits to sk_buff arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 09/20] bnxt: Propagate trait presence to skb arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 10/20] ice: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 11/20] veth: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 12/20] virtio_net: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 13/20] mlx5: move xdp_buff scope one level up arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 14/20] mlx5: Propagate trait presence to skb arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 15/20] xdp generic: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 16/20] trait: Support sk_buffs arthur
2025-03-10 11:45   ` Lorenzo Bianconi
2025-03-05 14:32 ` [PATCH RFC bpf-next 17/20] trait: Allow socket filters to access traits arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 18/20] trait: registration API arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 19/20] trait: Sync linux/bpf.h to tools/ for trait registration arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 20/20] trait: register traits in benchmarks and tests arthur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).