All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Arthur Fabre <arthur@arthurfabre.com>,
	Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
	jakub@cloudflare.com, yan@cloudflare.com,
	jbrandeburg@cloudflare.com, thoiland@redhat.com,
	lbiancon@redhat.com, Arthur Fabre <afabre@cloudflare.com>
Subject: Re: [PATCH RFC bpf-next 07/20] xdp: Track if metadata is supported in xdp_frame <> xdp_buff conversions
Date: Thu, 6 Mar 2025 12:12:51 +0100	[thread overview]
Message-ID: <45522396-0fad-406e-ba53-0bb4aee53e67@kernel.org> (raw)
In-Reply-To: <D88HSZ3GZZNN.160YSWHX1HIO2@arthurfabre.com>



On 05/03/2025 18.02, Arthur Fabre wrote:
> On Wed Mar 5, 2025 at 4:24 PM CET, Alexander Lobakin wrote:
>> From: Arthur <arthur@arthurfabre.com>
>> Date: Wed, 05 Mar 2025 15:32:04 +0100
>>
>>> From: Arthur Fabre <afabre@cloudflare.com>
>>>
>>> xdp_buff stores whether metadata is supported by a NIC by setting
>>> data_meta to be greater than data.
>>>
>>> But xdp_frame only stores the metadata size (as metasize), so converting
>>> between xdp_frame and xdp_buff is lossy.
>>>
>>> Steal an unused bit in xdp_frame to track whether metadata is supported
>>> or not.
>>>
>>> This will lets us have "generic" functions for setting skb fields from
>>> either xdp_frame or xdp_buff from drivers.
>>>
>>> Signed-off-by: Arthur Fabre <afabre@cloudflare.com>
>>> ---
>>>   include/net/xdp.h | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>>> index 58019fa299b56dbd45c104fdfa807f73af6e4fa4..84afe07d09efdb2ab0cb78b904f02cb74f9a56b6 100644
>>> --- a/include/net/xdp.h
>>> +++ b/include/net/xdp.h
>>> @@ -116,6 +116,9 @@ static __always_inline void xdp_buff_set_frag_pfmemalloc(struct xdp_buff *xdp)
>>>   	xdp->flags |= XDP_FLAGS_FRAGS_PF_MEMALLOC;
>>>   }
>>>   
>>> +static bool xdp_data_meta_unsupported(const struct xdp_buff *xdp);
>>> +static void xdp_set_data_meta_invalid(struct xdp_buff *xdp);
>>> +
>>>   static __always_inline void *xdp_buff_traits(const struct xdp_buff *xdp)
>>>   {
>>>   	return xdp->data_hard_start + _XDP_FRAME_SIZE;
>>> @@ -270,7 +273,9 @@ struct xdp_frame {
>>>   	void *data;
>>>   	u32 len;
>>>   	u32 headroom;
>>> -	u32 metasize; /* uses lower 8-bits */
>>> +	u32	:23, /* unused */
>>> +		meta_unsupported:1,
>>> +		metasize:8;
>>
>> See the history of this structure how we got rid of using bitfields here
>> and why.
>>
>> ...because of performance.
>>
>> Even though metasize uses only 8 bits, 1-byte access is slower than
>> 32-byte access.
> 
> Interesting, thanks!
> 

I agree with Olek, we should not use bitfields.  Thanks for catching this.

(The xdp_frame have a flags member...)
Why don't we use the flags member for storing this information?


>> I was going to write "you can use the fact that metasize is always a
>> multiple of 4 or that it's never > 252, for example, you could reuse LSB
>> as a flag indicating that meta is not supported", but first of all
>>
>> Do we still have drivers which don't support metadata?
>> Why don't they do that? It's not HW-specific or even driver-specific.
>> They don't reserve headroom? Then they're invalid, at least XDP_REDIRECT
>> won't work.
>>

I'm fairly sure that all drivers support XDP_REDIRECT.
Except didn't Lorenzo add a feature bit for this?
(so, some drivers might explicitly not-support this)

>> So maybe we need to fix those drivers first, if there are any.
> 
> Most drivers don't support metadata unfortunately:
> 
>> rg -U "xdp_prepare_buff\([^)]*false\);" drivers/net/
> drivers/net/tun.c
> 1712:		xdp_prepare_buff(&xdp, buf, pad, len, false);
> 
> drivers/net/ethernet/microsoft/mana/mana_bpf.c
> 94:	xdp_prepare_buff(xdp, buf_va, XDP_PACKET_HEADROOM, pkt_len, false);
> 
> drivers/net/ethernet/marvell/mvneta.c
> 2344:	xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
> 2345:			 data_len, false);
> 
> drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
> 1436:	xdp_prepare_buff(&xdp, hard_start, OTX2_HEAD_ROOM,
> 1437:			 cqe->sg.seg_size, false);
> 
> drivers/net/ethernet/socionext/netsec.c
> 1021:		xdp_prepare_buff(&xdp, desc->addr, NETSEC_RXBUF_HEADROOM,
> 1022:				 pkt_len, false);
> 
> drivers/net/ethernet/google/gve/gve_rx.c
> 740:	xdp_prepare_buff(&new, frame, headroom, len, false);
> 859:		xdp_prepare_buff(&xdp, page_info->page_address +
> 860:				 page_info->page_offset, GVE_RX_PAD,
> 861:				 len, false);
> 
> drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> 3984:			xdp_prepare_buff(&xdp, data,
> 3985:					 MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM,
> 3986:					 rx_bytes, false);
> 
> drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> 794:		xdp_prepare_buff(&xdp, hard_start, rx_ring->page_offset,
> 795:				 buff->len, false);
> 
> drivers/net/ethernet/cavium/thunder/nicvf_main.c
> 554:	xdp_prepare_buff(&xdp, hard_start, data - hard_start, len, false);
> 
> drivers/net/ethernet/ti/cpsw_new.c
> 348:		xdp_prepare_buff(&xdp, pa, headroom, size, false);
> 
> drivers/net/ethernet/freescale/enetc/enetc.c
> 1710:	xdp_prepare_buff(xdp_buff, hard_start - rx_ring->buffer_offset,
> 1711:			 rx_ring->buffer_offset, size, false);
> 
> drivers/net/ethernet/ti/am65-cpsw-nuss.c
> 1335:		xdp_prepare_buff(&xdp, page_addr, AM65_CPSW_HEADROOM,
> 1336:				 pkt_len, false);
> 
> drivers/net/ethernet/ti/cpsw.c
> 403:		xdp_prepare_buff(&xdp, pa, headroom, size, false);
> 
> drivers/net/ethernet/sfc/rx.c
> 289:	xdp_prepare_buff(&xdp, *ehp - EFX_XDP_HEADROOM, EFX_XDP_HEADROOM,
> 290:			 rx_buf->len, false);
> 
> drivers/net/ethernet/mediatek/mtk_eth_soc.c
> 2097:			xdp_prepare_buff(&xdp, data, MTK_PP_HEADROOM, pktlen,
> 2098:					 false);
> 
> drivers/net/ethernet/sfc/siena/rx.c
> 291:	xdp_prepare_buff(&xdp, *ehp - EFX_XDP_HEADROOM, EFX_XDP_HEADROOM,
> 292:			 rx_buf->len, false)
> 
> I don't know if it's just because no one has added calls to
> skb_metadata_set() in yet, or if there's a more fundamental reason.
> 

I simply think driver developers have been lazy.

If someone want some easy kernel commits, these drivers should be fairly
easy to fix...

> I think they all reserve some amount of headroom, but not always the
> full XDP_PACKET_HEADROOM. Eg sfc:
> 

The Intel drivers use 192 (AFAIK if that is still true). The API ended
up supporting non-standard XDP_PACKET_HEADROOM, due to the Intel
drivers, when XDP support was added to those (which is a long time ago now).

> drivers/net/ethernet/sfc/net_driver.h:
> /* Non-standard XDP_PACKET_HEADROOM and tailroom to satisfy XDP_REDIRECT and
>   * still fit two standard MTU size packets into a single 4K page.
>   */
> #define EFX_XDP_HEADROOM	128
> 

This is smaller than most drivers, but still have enough headroom for 
xdp_frame + traits.

> If it's just because skb_metadata_set() is missing, I can take the
> patches from this series that adds a "generic" XDP -> skb hook ("trait:
> Propagate presence of traits to sk_buff"), have it call
> skb_metadata_set(), and try to add it to all the drivers in a separate
> series.
> 

I think someone should cleanup those drivers and add support.

--Jesper

>>>   	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
>>>   	 * while mem_type is valid on remote CPU.
>>>   	 */
>>> @@ -369,6 +374,8 @@ void xdp_convert_frame_to_buff(const struct xdp_frame *frame,
>>>   	xdp->data = frame->data;
>>>   	xdp->data_end = frame->data + frame->len;
>>>   	xdp->data_meta = frame->data - frame->metasize;
>>> +	if (frame->meta_unsupported)
>>> +		xdp_set_data_meta_invalid(xdp);
>>>   	xdp->frame_sz = frame->frame_sz;
>>>   	xdp->flags = frame->flags;
>>>   }
>>> @@ -396,6 +403,7 @@ int xdp_update_frame_from_buff(const struct xdp_buff *xdp,
>>>   	xdp_frame->len  = xdp->data_end - xdp->data;
>>>   	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
>>>   	xdp_frame->metasize = metasize;
>>> +	xdp_frame->meta_unsupported = xdp_data_meta_unsupported(xdp);
>>>   	xdp_frame->frame_sz = xdp->frame_sz;
>>>   	xdp_frame->flags = xdp->flags;
>>
>> Thanks,
>> Olek

  reply	other threads:[~2025-03-06 11:12 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-05 14:31 [PATCH RFC bpf-next 00/20] traits: Per packet metadata KV store arthur
2025-03-05 14:31 ` [PATCH RFC bpf-next 01/20] trait: limited KV store for packet metadata arthur
2025-03-07  6:36   ` Alexei Starovoitov
2025-03-07 11:14     ` Arthur Fabre
2025-03-07 17:29       ` Alexei Starovoitov
2025-03-10 14:45         ` Arthur Fabre
2025-03-07 19:24   ` Jakub Sitnicki
2025-03-05 14:31 ` [PATCH RFC bpf-next 02/20] trait: XDP support arthur
2025-03-06 23:15   ` kernel test robot
2025-03-07  3:39   ` kernel test robot
2025-03-07 19:13   ` Lorenzo Bianconi
2025-03-10 15:50     ` Arthur Fabre
2025-03-05 14:32 ` [PATCH RFC bpf-next 03/20] trait: basic XDP selftest arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 04/20] trait: basic XDP benchmark arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 05/20] trait: Replace memcpy calls with inline copies arthur
2025-03-10 10:50   ` Lorenzo Bianconi
2025-03-10 15:52     ` Arthur Fabre
2025-03-10 22:15   ` David Laight
2025-03-05 14:32 ` [PATCH RFC bpf-next 06/20] trait: Replace memmove calls with inline move arthur
2025-03-06 10:14   ` Jesper Dangaard Brouer
2025-03-05 14:32 ` [PATCH RFC bpf-next 07/20] xdp: Track if metadata is supported in xdp_frame <> xdp_buff conversions arthur
2025-03-05 15:24   ` Alexander Lobakin
2025-03-05 17:02     ` Arthur Fabre
2025-03-06 11:12       ` Jesper Dangaard Brouer [this message]
2025-03-10 11:10         ` Lorenzo Bianconi
2025-03-05 14:32 ` [PATCH RFC bpf-next 08/20] trait: Propagate presence of traits to sk_buff arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 09/20] bnxt: Propagate trait presence to skb arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 10/20] ice: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 11/20] veth: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 12/20] virtio_net: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 13/20] mlx5: move xdp_buff scope one level up arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 14/20] mlx5: Propagate trait presence to skb arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 15/20] xdp generic: " arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 16/20] trait: Support sk_buffs arthur
2025-03-10 11:45   ` Lorenzo Bianconi
2025-03-05 14:32 ` [PATCH RFC bpf-next 17/20] trait: Allow socket filters to access traits arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 18/20] trait: registration API arthur
2025-03-06 18:52   ` kernel test robot
2025-03-05 14:32 ` [PATCH RFC bpf-next 19/20] trait: Sync linux/bpf.h to tools/ for trait registration arthur
2025-03-05 14:32 ` [PATCH RFC bpf-next 20/20] trait: register traits in benchmarks and tests arthur

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45522396-0fad-406e-ba53-0bb4aee53e67@kernel.org \
    --to=hawk@kernel.org \
    --cc=afabre@cloudflare.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=arthur@arthurfabre.com \
    --cc=bpf@vger.kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=jbrandeburg@cloudflare.com \
    --cc=lbiancon@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=thoiland@redhat.com \
    --cc=yan@cloudflare.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.