netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Arthur Fabre" <arthur@arthurfabre.com>
To: "Alexander Lobakin" <aleksander.lobakin@intel.com>
Cc: <netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
	<jakub@cloudflare.com>, <hawk@kernel.org>, <yan@cloudflare.com>,
	<jbrandeburg@cloudflare.com>, <thoiland@redhat.com>,
	<lbiancon@redhat.com>, "Arthur Fabre" <afabre@cloudflare.com>
Subject: Re: [PATCH RFC bpf-next 07/20] xdp: Track if metadata is supported in xdp_frame <> xdp_buff conversions
Date: Wed, 05 Mar 2025 18:02:06 +0100	[thread overview]
Message-ID: <D88HSZ3GZZNN.160YSWHX1HIO2@arthurfabre.com> (raw)
In-Reply-To: <bc356c91-5bff-454a-8f87-7415cb7e82b4@intel.com>

On Wed Mar 5, 2025 at 4:24 PM CET, Alexander Lobakin wrote:
> From: Arthur <arthur@arthurfabre.com>
> Date: Wed, 05 Mar 2025 15:32:04 +0100
>
> > From: Arthur Fabre <afabre@cloudflare.com>
> > 
> > xdp_buff stores whether metadata is supported by a NIC by setting
> > data_meta to be greater than data.
> > 
> > But xdp_frame only stores the metadata size (as metasize), so converting
> > between xdp_frame and xdp_buff is lossy.
> > 
> > Steal an unused bit in xdp_frame to track whether metadata is supported
> > or not.
> > 
> > This will lets us have "generic" functions for setting skb fields from
> > either xdp_frame or xdp_buff from drivers.
> > 
> > Signed-off-by: Arthur Fabre <afabre@cloudflare.com>
> > ---
> >  include/net/xdp.h | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 58019fa299b56dbd45c104fdfa807f73af6e4fa4..84afe07d09efdb2ab0cb78b904f02cb74f9a56b6 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -116,6 +116,9 @@ static __always_inline void xdp_buff_set_frag_pfmemalloc(struct xdp_buff *xdp)
> >  	xdp->flags |= XDP_FLAGS_FRAGS_PF_MEMALLOC;
> >  }
> >  
> > +static bool xdp_data_meta_unsupported(const struct xdp_buff *xdp);
> > +static void xdp_set_data_meta_invalid(struct xdp_buff *xdp);
> > +
> >  static __always_inline void *xdp_buff_traits(const struct xdp_buff *xdp)
> >  {
> >  	return xdp->data_hard_start + _XDP_FRAME_SIZE;
> > @@ -270,7 +273,9 @@ struct xdp_frame {
> >  	void *data;
> >  	u32 len;
> >  	u32 headroom;
> > -	u32 metasize; /* uses lower 8-bits */
> > +	u32	:23, /* unused */
> > +		meta_unsupported:1,
> > +		metasize:8;
>
> See the history of this structure how we got rid of using bitfields here
> and why.
>
> ...because of performance.
>
> Even though metasize uses only 8 bits, 1-byte access is slower than
> 32-byte access.

Interesting, thanks!

> I was going to write "you can use the fact that metasize is always a
> multiple of 4 or that it's never > 252, for example, you could reuse LSB
> as a flag indicating that meta is not supported", but first of all
>
> Do we still have drivers which don't support metadata?
> Why don't they do that? It's not HW-specific or even driver-specific.
> They don't reserve headroom? Then they're invalid, at least XDP_REDIRECT
> won't work.
>
> So maybe we need to fix those drivers first, if there are any.

Most drivers don't support metadata unfortunately:

> rg -U "xdp_prepare_buff\([^)]*false\);" drivers/net/
drivers/net/tun.c
1712:		xdp_prepare_buff(&xdp, buf, pad, len, false);

drivers/net/ethernet/microsoft/mana/mana_bpf.c
94:	xdp_prepare_buff(xdp, buf_va, XDP_PACKET_HEADROOM, pkt_len, false);

drivers/net/ethernet/marvell/mvneta.c
2344:	xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
2345:			 data_len, false);

drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
1436:	xdp_prepare_buff(&xdp, hard_start, OTX2_HEAD_ROOM,
1437:			 cqe->sg.seg_size, false);

drivers/net/ethernet/socionext/netsec.c
1021:		xdp_prepare_buff(&xdp, desc->addr, NETSEC_RXBUF_HEADROOM,
1022:				 pkt_len, false);

drivers/net/ethernet/google/gve/gve_rx.c
740:	xdp_prepare_buff(&new, frame, headroom, len, false);
859:		xdp_prepare_buff(&xdp, page_info->page_address +
860:				 page_info->page_offset, GVE_RX_PAD,
861:				 len, false);

drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
3984:			xdp_prepare_buff(&xdp, data,
3985:					 MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM,
3986:					 rx_bytes, false);

drivers/net/ethernet/aquantia/atlantic/aq_ring.c
794:		xdp_prepare_buff(&xdp, hard_start, rx_ring->page_offset,
795:				 buff->len, false);

drivers/net/ethernet/cavium/thunder/nicvf_main.c
554:	xdp_prepare_buff(&xdp, hard_start, data - hard_start, len, false);

drivers/net/ethernet/ti/cpsw_new.c
348:		xdp_prepare_buff(&xdp, pa, headroom, size, false);

drivers/net/ethernet/freescale/enetc/enetc.c
1710:	xdp_prepare_buff(xdp_buff, hard_start - rx_ring->buffer_offset,
1711:			 rx_ring->buffer_offset, size, false);

drivers/net/ethernet/ti/am65-cpsw-nuss.c
1335:		xdp_prepare_buff(&xdp, page_addr, AM65_CPSW_HEADROOM,
1336:				 pkt_len, false);

drivers/net/ethernet/ti/cpsw.c
403:		xdp_prepare_buff(&xdp, pa, headroom, size, false);

drivers/net/ethernet/sfc/rx.c
289:	xdp_prepare_buff(&xdp, *ehp - EFX_XDP_HEADROOM, EFX_XDP_HEADROOM,
290:			 rx_buf->len, false);

drivers/net/ethernet/mediatek/mtk_eth_soc.c
2097:			xdp_prepare_buff(&xdp, data, MTK_PP_HEADROOM, pktlen,
2098:					 false);

drivers/net/ethernet/sfc/siena/rx.c
291:	xdp_prepare_buff(&xdp, *ehp - EFX_XDP_HEADROOM, EFX_XDP_HEADROOM,
292:			 rx_buf->len, false)

I don't know if it's just because no one has added calls to
skb_metadata_set() in yet, or if there's a more fundamental reason.

I think they all reserve some amount of headroom, but not always the
full XDP_PACKET_HEADROOM. Eg sfc:

drivers/net/ethernet/sfc/net_driver.h:
/* Non-standard XDP_PACKET_HEADROOM and tailroom to satisfy XDP_REDIRECT and
 * still fit two standard MTU size packets into a single 4K page.
 */
#define EFX_XDP_HEADROOM	128

If it's just because skb_metadata_set() is missing, I can take the
patches from this series that adds a "generic" XDP -> skb hook ("trait:
Propagate presence of traits to sk_buff"), have it call
skb_metadata_set(), and try to add it to all the drivers in a separate
series.

> >  	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
> >  	 * while mem_type is valid on remote CPU.
> >  	 */
> > @@ -369,6 +374,8 @@ void xdp_convert_frame_to_buff(const struct xdp_frame *frame,
> >  	xdp->data = frame->data;
> >  	xdp->data_end = frame->data + frame->len;
> >  	xdp->data_meta = frame->data - frame->metasize;
> > +	if (frame->meta_unsupported)
> > +		xdp_set_data_meta_invalid(xdp);
> >  	xdp->frame_sz = frame->frame_sz;
> >  	xdp->flags = frame->flags;
> >  }
> > @@ -396,6 +403,7 @@ int xdp_update_frame_from_buff(const struct xdp_buff *xdp,
> >  	xdp_frame->len  = xdp->data_end - xdp->data;
> >  	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
> >  	xdp_frame->metasize = metasize;
> > +	xdp_frame->meta_unsupported = xdp_data_meta_unsupported(xdp);
> >  	xdp_frame->frame_sz = xdp->frame_sz;
> >  	xdp_frame->flags = xdp->flags;
>
> Thanks,
> Olek

  reply	other threads:[~2025-03-05 17:02 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-05 14:31 [PATCH RFC bpf-next 00/20] traits: Per packet metadata KV store arthur
2025-03-05 14:31 ` [PATCH RFC bpf-next 01/20] trait: limited KV store for packet metadata arthur
2025-03-07  6:36   ` Alexei Starovoitov
2025-03-07 11:14     ` Arthur Fabre
2025-03-07 17:29       ` Alexei Starovoitov
2025-03-10 14:45         ` Arthur Fabre
2025-03-07 19:24   ` Jakub Sitnicki
2025-03-05 14:31 ` [PATCH RFC bpf-next 02/20] trait: XDP support arthur
2025-03-07 19:13   ` Lorenzo Bianconi
2025-03-10 15:50     ` Arthur Fabre
2025-03-05 14:32 ` [PATCH RFC bpf-next 03/20] trait: basic XDP selftest arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 04/20] trait: basic XDP benchmark arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 05/20] trait: Replace memcpy calls with inline copies arthur
2025-03-10 10:50   ` Lorenzo Bianconi
2025-03-10 15:52     ` Arthur Fabre
2025-03-10 22:15   ` David Laight
2025-03-05 14:32 ` [PATCH RFC bpf-next 06/20] trait: Replace memmove calls with inline move arthur
2025-03-06 10:14   ` Jesper Dangaard Brouer
2025-03-05 14:32 ` [PATCH RFC bpf-next 07/20] xdp: Track if metadata is supported in xdp_frame <> xdp_buff conversions arthur
2025-03-05 15:24   ` Alexander Lobakin
2025-03-05 17:02     ` Arthur Fabre [this message]
2025-03-06 11:12       ` Jesper Dangaard Brouer
2025-03-10 11:10         ` Lorenzo Bianconi
2025-03-05 14:32 ` [PATCH RFC bpf-next 08/20] trait: Propagate presence of traits to sk_buff arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 09/20] bnxt: Propagate trait presence to skb arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 10/20] ice: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 11/20] veth: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 12/20] virtio_net: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 13/20] mlx5: move xdp_buff scope one level up arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 14/20] mlx5: Propagate trait presence to skb arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 15/20] xdp generic: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 16/20] trait: Support sk_buffs arthur
2025-03-10 11:45   ` Lorenzo Bianconi
2025-03-05 14:32 ` [PATCH RFC bpf-next 17/20] trait: Allow socket filters to access traits arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 18/20] trait: registration API arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 19/20] trait: Sync linux/bpf.h to tools/ for trait registration arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 20/20] trait: register traits in benchmarks and tests arthur

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D88HSZ3GZZNN.160YSWHX1HIO2@arthurfabre.com \
    --to=arthur@arthurfabre.com \
    --cc=afabre@cloudflare.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=hawk@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=jbrandeburg@cloudflare.com \
    --cc=lbiancon@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=thoiland@redhat.com \
    --cc=yan@cloudflare.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).