From: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
To: arthur@arthurfabre.com
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
jakub@cloudflare.com, hawk@kernel.org, yan@cloudflare.com,
jbrandeburg@cloudflare.com, thoiland@redhat.com,
lbiancon@redhat.com, Arthur Fabre <afabre@cloudflare.com>
Subject: Re: [PATCH RFC bpf-next 16/20] trait: Support sk_buffs
Date: Mon, 10 Mar 2025 12:45:03 +0100 [thread overview]
Message-ID: <Z87Qv2Jf3p3MeXRC@lore-desk> (raw)
In-Reply-To: <20250305-afabre-traits-010-rfc2-v1-16-d0ecfb869797@cloudflare.com>
[-- Attachment #1: Type: text/plain, Size: 5394 bytes --]
> From: Arthur Fabre <afabre@cloudflare.com>
>
> Hide the space used by traits from skb_headroom(): that space isn't
> actually usable.
>
> Preserve the trait store in pskb_expand_head() by copying it ahead of
> the new headroom. The struct xdp_frame at the start of the headroom
> isn't needed anymore, so we can overwrite it with traits, and introduce
> a new flag to indicate traits are stored at the start of the headroom.
>
> Cloned skbs share the same packet data and headroom as the original skb,
> so changes to traits in one would be reflected in the other.
> Is that ok?
> Are there cases where we would want a clone to have different traits?
> For now, prevent clones from using traits.
>
> Signed-off-by: Arthur Fabre <afabre@cloudflare.com>
> ---
> include/linux/skbuff.h | 25 +++++++++++++++++++++++--
> net/core/skbuff.c | 25 +++++++++++++++++++++++--
> 2 files changed, 46 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index d7dfee152ebd26ce87a230222e94076aca793adc..886537508789202339c925b5613574de67b7e43c 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -39,6 +39,7 @@
> #include <net/net_debug.h>
> #include <net/dropreason-core.h>
> #include <net/netmem.h>
> +#include <net/trait.h>
>
> /**
> * DOC: skb checksums
> @@ -729,6 +730,8 @@ enum skb_traits_type {
> SKB_TRAITS_NONE,
> /* Trait store in headroom, offset by sizeof(struct xdp_frame) */
> SKB_TRAITS_AFTER_XDP,
> + /* Trait store at start of headroom */
> + SKB_TRAITS_AT_HEAD,
> };
>
> /**
> @@ -1029,7 +1032,7 @@ struct sk_buff {
> __u8 csum_not_inet:1;
> #endif
> __u8 unreadable:1;
> - __u8 traits_type:1; /* See enum skb_traits_type */
> + __u8 traits_type:2; /* See enum skb_traits_type */
> #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
> __u16 tc_index; /* traffic control index */
> #endif
> @@ -2836,6 +2839,18 @@ static inline void *pskb_pull(struct sk_buff *skb, unsigned int len)
>
> void skb_condense(struct sk_buff *skb);
>
> +static inline void *skb_traits(const struct sk_buff *skb)
> +{
> + switch (skb->traits_type) {
> + case SKB_TRAITS_AFTER_XDP:
> + return skb->head + _XDP_FRAME_SIZE;
> + case SKB_TRAITS_AT_HEAD:
> + return skb->head;
> + default:
> + return NULL;
> + }
> +}
> +
> /**
> * skb_headroom - bytes at buffer head
> * @skb: buffer to check
> @@ -2844,7 +2859,13 @@ void skb_condense(struct sk_buff *skb);
> */
> static inline unsigned int skb_headroom(const struct sk_buff *skb)
> {
> - return skb->data - skb->head;
> + int trait_size = 0;
> + void *traits = skb_traits(skb);
> +
> + if (traits)
> + trait_size = traits_size(traits);
> +
> + return skb->data - skb->head - trait_size;
I am not fully aware of all possible use-cases, but do we really need to
store hw medata traits (e.g. hw rx checksum or hw rx hash) in the skb
headroom when we convert the xdp_frame/xdp_buff in the skb? All of these
fields already have dedicated fields in the skb struct. Moreover, we need
to set them in order to have a real performance improvements when we execute
XDP_PASS. Something like:
https://lore.kernel.org/bpf/01ce17910fdd7b693c23132663fa884d5ec7f440.1726935917.git.lorenzo@kernel.org/
Regards,
Lorenzo
> }
>
> /**
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7b03b64fdcb276f68ce881d1d8da8e4c6b897efc..83f58517738e8ff12990c28b09336ed44f4be32a 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1515,6 +1515,19 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
> atomic_inc(&(skb_shinfo(skb)->dataref));
> skb->cloned = 1;
>
> + /* traits would end up shared with the clone,
> + * and edits would be reflected there.
> + *
> + * Is that ok? What if the original skb and the clone take different paths?
> + * Does that even happen?
> + *
> + * If that's not ok, we could copy the traits and store them in an extension header
> + * for clones.
> + *
> + * For now, pretend the clone doesn't have any traits.
> + */
> + skb->traits_type = SKB_TRAITS_NONE;
> +
> return n;
> #undef C
> }
> @@ -2170,7 +2183,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> unsigned int osize = skb_end_offset(skb);
> unsigned int size = osize + nhead + ntail;
> long off;
> - u8 *data;
> + u8 *data, *head;
> int i;
>
> BUG_ON(nhead < 0);
> @@ -2187,10 +2200,18 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> goto nodata;
> size = SKB_WITH_OVERHEAD(size);
>
> + head = skb->head;
> + if (skb->traits_type != SKB_TRAITS_NONE) {
> + head = skb_traits(skb) + traits_size(skb_traits(skb));
> + /* struct xdp_frame isn't needed in the headroom, drop it */
> + memcpy(data, skb_traits(skb), traits_size(skb_traits(skb)));
> + skb->traits_type = SKB_TRAITS_AT_HEAD;
> + }
> +
> /* Copy only real data... and, alas, header. This should be
> * optimized for the cases when header is void.
> */
> - memcpy(data + nhead, skb->head, skb_tail_pointer(skb) - skb->head);
> + memcpy(data + nhead, head, skb_tail_pointer(skb) - head);
>
> memcpy((struct skb_shared_info *)(data + size),
> skb_shinfo(skb),
>
> --
> 2.43.0
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2025-03-10 11:45 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-05 14:31 [PATCH RFC bpf-next 00/20] traits: Per packet metadata KV store arthur
2025-03-05 14:31 ` [PATCH RFC bpf-next 01/20] trait: limited KV store for packet metadata arthur
2025-03-07 6:36 ` Alexei Starovoitov
2025-03-07 11:14 ` Arthur Fabre
2025-03-07 17:29 ` Alexei Starovoitov
2025-03-10 14:45 ` Arthur Fabre
2025-03-07 19:24 ` Jakub Sitnicki
2025-03-05 14:31 ` [PATCH RFC bpf-next 02/20] trait: XDP support arthur
2025-03-06 23:15 ` kernel test robot
2025-03-07 3:39 ` kernel test robot
2025-03-07 19:13 ` Lorenzo Bianconi
2025-03-10 15:50 ` Arthur Fabre
2025-03-05 14:32 ` [PATCH RFC bpf-next 03/20] trait: basic XDP selftest arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 04/20] trait: basic XDP benchmark arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 05/20] trait: Replace memcpy calls with inline copies arthur
2025-03-10 10:50 ` Lorenzo Bianconi
2025-03-10 15:52 ` Arthur Fabre
2025-03-10 22:15 ` David Laight
2025-03-05 14:32 ` [PATCH RFC bpf-next 06/20] trait: Replace memmove calls with inline move arthur
2025-03-06 10:14 ` Jesper Dangaard Brouer
2025-03-05 14:32 ` [PATCH RFC bpf-next 07/20] xdp: Track if metadata is supported in xdp_frame <> xdp_buff conversions arthur
2025-03-05 15:24 ` Alexander Lobakin
2025-03-05 17:02 ` Arthur Fabre
2025-03-06 11:12 ` Jesper Dangaard Brouer
2025-03-10 11:10 ` Lorenzo Bianconi
2025-03-05 14:32 ` [PATCH RFC bpf-next 08/20] trait: Propagate presence of traits to sk_buff arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 09/20] bnxt: Propagate trait presence to skb arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 10/20] ice: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 11/20] veth: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 12/20] virtio_net: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 13/20] mlx5: move xdp_buff scope one level up arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 14/20] mlx5: Propagate trait presence to skb arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 15/20] xdp generic: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 16/20] trait: Support sk_buffs arthur
2025-03-10 11:45 ` Lorenzo Bianconi [this message]
2025-03-05 14:32 ` [PATCH RFC bpf-next 17/20] trait: Allow socket filters to access traits arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 18/20] trait: registration API arthur
2025-03-06 18:52 ` kernel test robot
2025-03-05 14:32 ` [PATCH RFC bpf-next 19/20] trait: Sync linux/bpf.h to tools/ for trait registration arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 20/20] trait: register traits in benchmarks and tests arthur
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z87Qv2Jf3p3MeXRC@lore-desk \
--to=lorenzo.bianconi@redhat.com \
--cc=afabre@cloudflare.com \
--cc=arthur@arthurfabre.com \
--cc=bpf@vger.kernel.org \
--cc=hawk@kernel.org \
--cc=jakub@cloudflare.com \
--cc=jbrandeburg@cloudflare.com \
--cc=lbiancon@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=thoiland@redhat.com \
--cc=yan@cloudflare.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.