linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next,v3 0/2] Clarify and Enhance XDP Rx Metadata Handling
@ 2025-07-02 16:57 Song Yoong Siang
  2025-07-02 16:57 ` [PATCH bpf-next,v3 1/2] doc: enhance explanation of XDP Rx metadata layout and METADATA_SIZE Song Yoong Siang
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Song Yoong Siang @ 2025-07-02 16:57 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, Mykola Lysenko, Shuah Khan
  Cc: netdev, linux-doc, linux-kernel, bpf, linux-kselftest

This patch set improves the documentation and selftests for XDP Rx metadata
handling. The first patch clarifies the documentation around XDP metadata
layout and METADATA_SIZE. The second patch enhances the BPF selftests to
make XDP metadata handling more robust across different NICs.

Prior to this patch set, the XDP program might accidentally overwrite the
device-reserved metadata.

V3:
  - update doc and commit msg accordingly.

V2: https://lore.kernel.org/netdev/20250702030349.3275368-1-yoong.siang.song@intel.com/
  - unconditionally do bpf_xdp_adjust_meta with -XDP_METADATA_SIZE (Stanislav)

V1: https://lore.kernel.org/netdev/20250701042940.3272325-1-yoong.siang.song@intel.com/

Song Yoong Siang (2):
  doc: enhance explanation of XDP Rx metadata layout and METADATA_SIZE
  selftests/bpf: Enhance XDP Rx metadata handling

 Documentation/networking/xdp-rx-metadata.rst  | 36 +++++++++++++++----
 .../selftests/bpf/prog_tests/xdp_metadata.c   |  2 +-
 .../selftests/bpf/progs/xdp_hw_metadata.c     |  2 +-
 .../selftests/bpf/progs/xdp_metadata.c        |  2 +-
 tools/testing/selftests/bpf/xdp_hw_metadata.c |  2 +-
 tools/testing/selftests/bpf/xdp_metadata.h    |  7 ++++
 6 files changed, 41 insertions(+), 10 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH bpf-next,v3 1/2] doc: enhance explanation of XDP Rx metadata layout and METADATA_SIZE
  2025-07-02 16:57 [PATCH bpf-next,v3 0/2] Clarify and Enhance XDP Rx Metadata Handling Song Yoong Siang
@ 2025-07-02 16:57 ` Song Yoong Siang
  2025-07-03 15:57   ` Daniel Borkmann
  2025-07-02 16:57 ` [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling Song Yoong Siang
  2025-07-03 15:41 ` [PATCH bpf-next,v3 0/2] Clarify and Enhance XDP Rx Metadata Handling Stanislav Fomichev
  2 siblings, 1 reply; 15+ messages in thread
From: Song Yoong Siang @ 2025-07-02 16:57 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, Mykola Lysenko, Shuah Khan
  Cc: netdev, linux-doc, linux-kernel, bpf, linux-kselftest

Add diagram to show metadata layout of devices that utilize the data_meta
area for their own purposes. Besides, enhance the documentation on
selecting an appropriate METADATA_SIZE for XDP Rx metadata, ensuring it
accommodates both device-reserved and custom metadata. It includes
considerations for alignment and size constraints. The updated guidance
helps users correctly allocate and access metadata in AF_XDP scenarios.

Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
---
 Documentation/networking/xdp-rx-metadata.rst | 36 ++++++++++++++++----
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
index a6e0ece18be5..65a1a6e0f7a2 100644
--- a/Documentation/networking/xdp-rx-metadata.rst
+++ b/Documentation/networking/xdp-rx-metadata.rst
@@ -54,6 +54,19 @@ area in whichever format it chooses. Later consumers of the metadata
 will have to agree on the format by some out of band contract (like for
 the AF_XDP use case, see below).
 
+It is important to note that some devices may utilize the ``data_meta`` area for
+their own purposes. For example, the IGC device utilizes ``IGC_TS_HDR_LEN``
+bytes of the ``data_meta`` area for receiving hardware timestamps. Therefore,
+the XDP program should ensure that it does not overwrite any existing metadata.
+The metadata layout of such device is depicted below::
+
+  +----------+-----------------+--------------------------+------+
+  | headroom | custom metadata | device-reserved metadata | data |
+  +----------+-----------------+--------------------------+------+
+             ^                                            ^
+             |                                            |
+   xdp_buff->data_meta                              xdp_buff->data
+
 AF_XDP
 ======
 
@@ -69,12 +82,23 @@ descriptor does _not_ explicitly carry the size of the metadata).
 
 Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
 
-  +----------+-----------------+------+
-  | headroom | custom metadata | data |
-  +----------+-----------------+------+
-                               ^
-                               |
-                        rx_desc->address
+             |<--------------METADATA_SIZE--------------->|
+  +----------+-----------------+--------------------------+------+
+  | headroom | custom metadata | device-reserved metadata | data |
+  +----------+-----------------+--------------------------+------+
+                                                          ^
+                                                          |
+                                                   rx_desc->address
+
+It is crucial that the agreed ``METADATA_SIZE`` between the BPF program and the
+final consumer is sufficient to accommodate both device-reserved metadata and
+the data the BPF program needs to populate.
+
+``bpf_xdp_adjust_meta`` ensures that ``METADATA_SIZE`` is aligned to 4 bytes,
+does not exceed 252 bytes, and leaves sufficient space for building the
+xdp_frame. If these conditions are not met, it returns a negative error. In this
+case, the BPF program should not proceed to populate data into the ``data_meta``
+area.
 
 XDP_PASS
 ========
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling
  2025-07-02 16:57 [PATCH bpf-next,v3 0/2] Clarify and Enhance XDP Rx Metadata Handling Song Yoong Siang
  2025-07-02 16:57 ` [PATCH bpf-next,v3 1/2] doc: enhance explanation of XDP Rx metadata layout and METADATA_SIZE Song Yoong Siang
@ 2025-07-02 16:57 ` Song Yoong Siang
  2025-07-03 17:04   ` Jesper Dangaard Brouer
  2025-07-03 15:41 ` [PATCH bpf-next,v3 0/2] Clarify and Enhance XDP Rx Metadata Handling Stanislav Fomichev
  2 siblings, 1 reply; 15+ messages in thread
From: Song Yoong Siang @ 2025-07-02 16:57 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, Mykola Lysenko, Shuah Khan
  Cc: netdev, linux-doc, linux-kernel, bpf, linux-kselftest

Introduce the XDP_METADATA_SIZE macro as a conservative measure to
accommodate any metadata areas reserved by Ethernet devices.

Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
---
 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 2 +-
 tools/testing/selftests/bpf/progs/xdp_hw_metadata.c   | 2 +-
 tools/testing/selftests/bpf/progs/xdp_metadata.c      | 2 +-
 tools/testing/selftests/bpf/xdp_hw_metadata.c         | 2 +-
 tools/testing/selftests/bpf/xdp_metadata.h            | 7 +++++++
 5 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
index 19f92affc2da..8d6c2633698b 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
@@ -302,7 +302,7 @@ static int verify_xsk_metadata(struct xsk *xsk, bool sent_from_af_xdp)
 
 	/* custom metadata */
 
-	meta = data - sizeof(struct xdp_meta);
+	meta = data - XDP_METADATA_SIZE;
 
 	if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp"))
 		return -1;
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index 330ece2eabdb..3766f58d3486 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -72,7 +72,7 @@ int rx(struct xdp_md *ctx)
 		return XDP_PASS;
 	}
 
-	err = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
+	err = bpf_xdp_adjust_meta(ctx, -(int)XDP_METADATA_SIZE);
 	if (err) {
 		__sync_add_and_fetch(&pkts_fail, 1);
 		return XDP_PASS;
diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c
index 09bb8a038d52..5cada85fe0f4 100644
--- a/tools/testing/selftests/bpf/progs/xdp_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c
@@ -73,7 +73,7 @@ int rx(struct xdp_md *ctx)
 
 	/* Reserve enough for all custom metadata. */
 
-	ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
+	ret = bpf_xdp_adjust_meta(ctx, -(int)XDP_METADATA_SIZE);
 	if (ret != 0)
 		return XDP_DROP;
 
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 3d8de0d4c96a..a529d55d4ff4 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -223,7 +223,7 @@ static void verify_xdp_metadata(void *data, clockid_t clock_id)
 {
 	struct xdp_meta *meta;
 
-	meta = data - sizeof(*meta);
+	meta = data - XDP_METADATA_SIZE;
 
 	if (meta->hint_valid & XDP_META_FIELD_RSS)
 		printf("rx_hash: 0x%X with RSS type:0x%X\n",
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
index 87318ad1117a..2dfd3bf5e7bb 100644
--- a/tools/testing/selftests/bpf/xdp_metadata.h
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -50,3 +50,10 @@ struct xdp_meta {
 	};
 	enum xdp_meta_field hint_valid;
 };
+
+/* XDP_METADATA_SIZE must be at least the size of struct xdp_meta. An additional
+ * 32 bytes of padding is included as a conservative measure to accommodate any
+ * metadata areas reserved by Ethernet devices. If the device-reserved metadata
+ * exceeds 32 bytes, this value will need adjustment.
+ */
+#define XDP_METADATA_SIZE	(sizeof(struct xdp_meta) + 32)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next,v3 0/2] Clarify and Enhance XDP Rx Metadata Handling
  2025-07-02 16:57 [PATCH bpf-next,v3 0/2] Clarify and Enhance XDP Rx Metadata Handling Song Yoong Siang
  2025-07-02 16:57 ` [PATCH bpf-next,v3 1/2] doc: enhance explanation of XDP Rx metadata layout and METADATA_SIZE Song Yoong Siang
  2025-07-02 16:57 ` [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling Song Yoong Siang
@ 2025-07-03 15:41 ` Stanislav Fomichev
  2 siblings, 0 replies; 15+ messages in thread
From: Stanislav Fomichev @ 2025-07-03 15:41 UTC (permalink / raw)
  To: Song Yoong Siang
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, Mykola Lysenko, Shuah Khan, netdev, linux-doc,
	linux-kernel, bpf, linux-kselftest

On 07/03, Song Yoong Siang wrote:
> This patch set improves the documentation and selftests for XDP Rx metadata
> handling. The first patch clarifies the documentation around XDP metadata
> layout and METADATA_SIZE. The second patch enhances the BPF selftests to
> make XDP metadata handling more robust across different NICs.
> 
> Prior to this patch set, the XDP program might accidentally overwrite the
> device-reserved metadata.
> 
> V3:
>   - update doc and commit msg accordingly.
> 
> V2: https://lore.kernel.org/netdev/20250702030349.3275368-1-yoong.siang.song@intel.com/
>   - unconditionally do bpf_xdp_adjust_meta with -XDP_METADATA_SIZE (Stanislav)
> 
> V1: https://lore.kernel.org/netdev/20250701042940.3272325-1-yoong.siang.song@intel.com/

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next,v3 1/2] doc: enhance explanation of XDP Rx metadata layout and METADATA_SIZE
  2025-07-02 16:57 ` [PATCH bpf-next,v3 1/2] doc: enhance explanation of XDP Rx metadata layout and METADATA_SIZE Song Yoong Siang
@ 2025-07-03 15:57   ` Daniel Borkmann
  2025-07-04  0:33     ` Song, Yoong Siang
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2025-07-03 15:57 UTC (permalink / raw)
  To: Song Yoong Siang, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Alexei Starovoitov,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: netdev, linux-doc, linux-kernel, bpf, linux-kselftest

On 7/2/25 6:57 PM, Song Yoong Siang wrote:
[...]
> +It is important to note that some devices may utilize the ``data_meta`` area for
> +their own purposes. For example, the IGC device utilizes ``IGC_TS_HDR_LEN``
> +bytes of the ``data_meta`` area for receiving hardware timestamps. Therefore,
> +the XDP program should ensure that it does not overwrite any existing metadata.
> +The metadata layout of such device is depicted below::
> +
> +  +----------+-----------------+--------------------------+------+
> +  | headroom | custom metadata | device-reserved metadata | data |
> +  +----------+-----------------+--------------------------+------+
> +             ^                                            ^
> +             |                                            |
> +   xdp_buff->data_meta                              xdp_buff->data

Imho, this section is misleading to developers. Suppose you're a XDP program writer
and you want to implement a generic native BPF program (independent of the underlying
NIC). Does this mean, the expectation is to dig into driver code to gather whether
or not a driver is prepopulating and how much of it? What are the implications if the
data is overwritten? For example, in Cilium today we use the buffer described here
as device-reserved metadata and override it. How will users know what breaks?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling
  2025-07-02 16:57 ` [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling Song Yoong Siang
@ 2025-07-03 17:04   ` Jesper Dangaard Brouer
  2025-07-04  1:17     ` Song, Yoong Siang
  0 siblings, 1 reply; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2025-07-03 17:04 UTC (permalink / raw)
  To: Song Yoong Siang, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Stanislav Fomichev,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: netdev, linux-doc, linux-kernel, bpf, linux-kselftest



On 02/07/2025 18.57, Song Yoong Siang wrote:
> Introduce the XDP_METADATA_SIZE macro as a conservative measure to
> accommodate any metadata areas reserved by Ethernet devices.
> 

This seems like a sloppy workaround :-(

To me, the problem arise because AF_XDP is lacking the ability to
communicate the size of the data_meta area.  If we had this capability,
then we could allow the IGC driver to take some of the space, have the
BPF-prog expand it futher (bpf_xdp_adjust_meta) and then userspace
AF_XDP would simply be able to see the size of the data_meta area, and
apply the struct xdp_meta at right offset.


> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
> ---
>   tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 2 +-
>   tools/testing/selftests/bpf/progs/xdp_hw_metadata.c   | 2 +-
>   tools/testing/selftests/bpf/progs/xdp_metadata.c      | 2 +-
>   tools/testing/selftests/bpf/xdp_hw_metadata.c         | 2 +-
>   tools/testing/selftests/bpf/xdp_metadata.h            | 7 +++++++
>   5 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
> index 19f92affc2da..8d6c2633698b 100644
> --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
> +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
> @@ -302,7 +302,7 @@ static int verify_xsk_metadata(struct xsk *xsk, bool sent_from_af_xdp)
>   
>   	/* custom metadata */
>   
> -	meta = data - sizeof(struct xdp_meta);
> +	meta = data - XDP_METADATA_SIZE;
>   
>   	if (!ASSERT_NEQ(meta->rx_timestamp, 0, "rx_timestamp"))
>   		return -1;
> diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> index 330ece2eabdb..3766f58d3486 100644
> --- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
> @@ -72,7 +72,7 @@ int rx(struct xdp_md *ctx)
>   		return XDP_PASS;
>   	}
>   
> -	err = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
> +	err = bpf_xdp_adjust_meta(ctx, -(int)XDP_METADATA_SIZE);
>   	if (err) {
>   		__sync_add_and_fetch(&pkts_fail, 1);
>   		return XDP_PASS;
> diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c
> index 09bb8a038d52..5cada85fe0f4 100644
> --- a/tools/testing/selftests/bpf/progs/xdp_metadata.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c
> @@ -73,7 +73,7 @@ int rx(struct xdp_md *ctx)
>   
>   	/* Reserve enough for all custom metadata. */
>   
> -	ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(struct xdp_meta));
> +	ret = bpf_xdp_adjust_meta(ctx, -(int)XDP_METADATA_SIZE);
>   	if (ret != 0)
>   		return XDP_DROP;
>   
> diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> index 3d8de0d4c96a..a529d55d4ff4 100644
> --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> @@ -223,7 +223,7 @@ static void verify_xdp_metadata(void *data, clockid_t clock_id)
>   {
>   	struct xdp_meta *meta;
>   
> -	meta = data - sizeof(*meta);
> +	meta = data - XDP_METADATA_SIZE;
>   
>   	if (meta->hint_valid & XDP_META_FIELD_RSS)
>   		printf("rx_hash: 0x%X with RSS type:0x%X\n",
> diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
> index 87318ad1117a..2dfd3bf5e7bb 100644
> --- a/tools/testing/selftests/bpf/xdp_metadata.h
> +++ b/tools/testing/selftests/bpf/xdp_metadata.h
> @@ -50,3 +50,10 @@ struct xdp_meta {
>   	};
>   	enum xdp_meta_field hint_valid;
>   };
> +
> +/* XDP_METADATA_SIZE must be at least the size of struct xdp_meta. An additional
> + * 32 bytes of padding is included as a conservative measure to accommodate any
> + * metadata areas reserved by Ethernet devices. If the device-reserved metadata
> + * exceeds 32 bytes, this value will need adjustment.
> + */
> +#define XDP_METADATA_SIZE	(sizeof(struct xdp_meta) + 32)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH bpf-next,v3 1/2] doc: enhance explanation of XDP Rx metadata layout and METADATA_SIZE
  2025-07-03 15:57   ` Daniel Borkmann
@ 2025-07-04  0:33     ` Song, Yoong Siang
  0 siblings, 0 replies; 15+ messages in thread
From: Song, Yoong Siang @ 2025-07-04  0:33 UTC (permalink / raw)
  To: Daniel Borkmann, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Alexei Starovoitov,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org

On Thursday, July 3, 2025 11:58 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>On 7/2/25 6:57 PM, Song Yoong Siang wrote:
>[...]
>> +It is important to note that some devices may utilize the ``data_meta`` area for
>> +their own purposes. For example, the IGC device utilizes ``IGC_TS_HDR_LEN``
>> +bytes of the ``data_meta`` area for receiving hardware timestamps. Therefore,
>> +the XDP program should ensure that it does not overwrite any existing metadata.
>> +The metadata layout of such device is depicted below::
>> +
>> +  +----------+-----------------+--------------------------+------+
>> +  | headroom | custom metadata | device-reserved metadata | data |
>> +  +----------+-----------------+--------------------------+------+
>> +             ^                                            ^
>> +             |                                            |
>> +   xdp_buff->data_meta                              xdp_buff->data
>
>Imho, this section is misleading to developers. Suppose you're a XDP program writer
>and you want to implement a generic native BPF program (independent of the underlying
>NIC). Does this mean, the expectation is to dig into driver code to gather whether
>or not a driver is prepopulating and how much of it? What are the implications if the
>data is overwritten? For example, in Cilium today we use the buffer described here
>as device-reserved metadata and override it. How will users know what breaks?

Thanks for your input.

A generic XDP program can always check the size of device-reserved metadata by
"ctx->data - ctx->data_meta" and avoid overwrite it, as shown in code below in my
v1 submission [1]. This requires driver to expose the metadata length used [2].
However, I dint have good justification for making the metadata length user-visible.
So, I submitted this v3 to keep it simple. Any thoughts?

+	metalen_used = ctx->data - ctx->data_meta;
+	metalen_to_adjust = XDP_METADATA_SIZE - metalen_used;
+	if (metalen_to_adjust < (int)sizeof(struct xdp_meta))
+		return XDP_DROP;
+
+	ret = bpf_xdp_adjust_meta(ctx, -metalen_to_adjust);

[1] https://lore.kernel.org/netdev/20250701042940.3272325-3-yoong.siang.song@intel.com/
[2] https://lore.kernel.org/netdev/20250701080955.3273137-1-yoong.siang.song@intel.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling
  2025-07-03 17:04   ` Jesper Dangaard Brouer
@ 2025-07-04  1:17     ` Song, Yoong Siang
  2025-07-04  9:58       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 15+ messages in thread
From: Song, Yoong Siang @ 2025-07-04  1:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, Mykola Lysenko, Shuah Khan
  Cc: netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org

On Friday, July 4, 2025 1:05 AM, Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>On 02/07/2025 18.57, Song Yoong Siang wrote:
>> Introduce the XDP_METADATA_SIZE macro as a conservative measure to
>> accommodate any metadata areas reserved by Ethernet devices.
>>
>
>This seems like a sloppy workaround :-(
>
>To me, the problem arise because AF_XDP is lacking the ability to
>communicate the size of the data_meta area.  If we had this capability,
>then we could allow the IGC driver to take some of the space, have the
>BPF-prog expand it futher (bpf_xdp_adjust_meta) and then userspace
>AF_XDP would simply be able to see the size of the data_meta area, and
>apply the struct xdp_meta at right offset.
>
Thanks for your input.

I agree with you that the implementation will be simple if user application
able to get the size of data_meta area. The intention of this patch set is to let
developer aware of such limitations before we have a perfect solution.

Btw, do you got any suggestion on how to expose the metadata length?
I not sure whether xdp_desc.options is a simple and good idea or not?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling
  2025-07-04  1:17     ` Song, Yoong Siang
@ 2025-07-04  9:58       ` Jesper Dangaard Brouer
  2025-07-04 11:38         ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2025-07-04  9:58 UTC (permalink / raw)
  To: Song, Yoong Siang, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Stanislav Fomichev,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, Magnus Karlsson, Björn Töpel,
	Maciej Fijalkowski, Jonathan Lemon
  Cc: netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org



On 04/07/2025 03.17, Song, Yoong Siang wrote:
> On Friday, July 4, 2025 1:05 AM, Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>> On 02/07/2025 18.57, Song Yoong Siang wrote:
>>> Introduce the XDP_METADATA_SIZE macro as a conservative measure to
>>> accommodate any metadata areas reserved by Ethernet devices.
>>>
>>
>> This seems like a sloppy workaround :-(
>>
>> To me, the problem arise because AF_XDP is lacking the ability to
>> communicate the size of the data_meta area.  If we had this capability,
>> then we could allow the IGC driver to take some of the space, have the
>> BPF-prog expand it futher (bpf_xdp_adjust_meta) and then userspace
>> AF_XDP would simply be able to see the size of the data_meta area, and
>> apply the struct xdp_meta at right offset.
>>
> Thanks for your input.
> 
> I agree with you that the implementation will be simple if user application
> able to get the size of data_meta area. The intention of this patch set is to let
> developer aware of such limitations before we have a perfect solution.
> 
> Btw, do you got any suggestion on how to expose the metadata length?
> I not sure whether xdp_desc.options is a simple and good idea or not?

That is a question to the AF_XDP maintainers... added them to this email.

/* Rx/Tx descriptor */
struct xdp_desc {
	__u64 addr;
	__u32 len;
	__u32 options;
};

As far as I know, the xdp_desc.options field isn't used, right?


(Please AF_XDP experts, please verify below statements:)
Something else we likely want to document: The available headroom in the
AF_XDP frame.  When accessing the metadata in userspace AF_XDP we do a
negative offset from the UMEM packet pointer.  IIRC on RX the available
headroom will be either 255 or 192 bytes (depending on NIC drivers).

Slightly confusing when AF_XDP transmitting from userspace the UMEM
headroom is default zero (XSK_UMEM__DEFAULT_FRAME_HEADROOM is zero).
This is configurable via xsk_umem_config.frame_headroom, like I did in
this example[1].

Maybe I did something wrong in[1], because I see that the new method is
setting xsk_umem_config.tx_metadata_len + flag XDP_UMEM_TX_METADATA_LEN.
This is nicely documented in [2]. How does this interact with setting
xsk_umem_config.frame_headroom ?


[1] 
https://github.com/xdp-project/bpf-examples/blob/3f365af4be1fe6a0ef77e751ff9b12c912810453/AF_XDP-interaction/af_xdp_user.c#L423-L424
[2] https://www.kernel.org/doc/html/v6.12/networking/xsk-tx-metadata.html

--Jesper

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling
  2025-07-04  9:58       ` Jesper Dangaard Brouer
@ 2025-07-04 11:38         ` Daniel Borkmann
  2025-07-07 15:03           ` Stanislav Fomichev
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2025-07-04 11:38 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Song, Yoong Siang, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Alexei Starovoitov, John Fastabend,
	Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, Mykola Lysenko, Shuah Khan, Magnus Karlsson,
	Björn Töpel, Maciej Fijalkowski, Jonathan Lemon
  Cc: netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org

On 7/4/25 11:58 AM, Jesper Dangaard Brouer wrote:
> On 04/07/2025 03.17, Song, Yoong Siang wrote:
>> On Friday, July 4, 2025 1:05 AM, Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>>> On 02/07/2025 18.57, Song Yoong Siang wrote:
>>>> Introduce the XDP_METADATA_SIZE macro as a conservative measure to
>>>> accommodate any metadata areas reserved by Ethernet devices.
>>>
>>> This seems like a sloppy workaround :-(
>>>
>>> To me, the problem arise because AF_XDP is lacking the ability to
>>> communicate the size of the data_meta area.  If we had this capability,
>>> then we could allow the IGC driver to take some of the space, have the
>>> BPF-prog expand it futher (bpf_xdp_adjust_meta) and then userspace
>>> AF_XDP would simply be able to see the size of the data_meta area, and
>>> apply the struct xdp_meta at right offset.
>>>
>> Thanks for your input.
>>
>> I agree with you that the implementation will be simple if user application
>> able to get the size of data_meta area. The intention of this patch set is to let
>> developer aware of such limitations before we have a perfect solution.
>>
>> Btw, do you got any suggestion on how to expose the metadata length?
>> I not sure whether xdp_desc.options is a simple and good idea or not?
> 
> That is a question to the AF_XDP maintainers... added them to this email.
> 
> /* Rx/Tx descriptor */
> struct xdp_desc {
>      __u64 addr;
>      __u32 len;
>      __u32 options;
> };
> 
> As far as I know, the xdp_desc.options field isn't used, right?

The options holds flags, see also XDP_PKT_CONTD and XDP_TX_METADATA.

> (Please AF_XDP experts, please verify below statements:)
> Something else we likely want to document: The available headroom in the
> AF_XDP frame.  When accessing the metadata in userspace AF_XDP we do a
> negative offset from the UMEM packet pointer.  IIRC on RX the available
> headroom will be either 255 or 192 bytes (depending on NIC drivers).
> 
> Slightly confusing when AF_XDP transmitting from userspace the UMEM
> headroom is default zero (XSK_UMEM__DEFAULT_FRAME_HEADROOM is zero).
> This is configurable via xsk_umem_config.frame_headroom, like I did in
> this example[1].
> 
> Maybe I did something wrong in[1], because I see that the new method is
> setting xsk_umem_config.tx_metadata_len + flag XDP_UMEM_TX_METADATA_LEN.
> This is nicely documented in [2]. How does this interact with setting
> xsk_umem_config.frame_headroom ?

If you request XDP_UMEM_TX_METADATA_LEN then on TX side you can fill
struct xsk_tx_metadata before the start of packet data, that is,
meta = data - sizeof(struct xsk_tx_metadata). The validity of the
latter is indicated via desc->options |= XDP_TX_METADATA and then
you fill meta->flags with things like XDP_TXMD_FLAGS_CHECKSUM to
tell that the related fields are valid (ex. request.csum_start,
request.csum_offset) and that you expect the driver to do the
offload with this info. This is also what I mentioned in the other
thread some time ago that imho it would make sense to have this also
on RX side somewhat similar to virtio_net_hdr..

> [1] https://github.com/xdp-project/bpf-examples/blob/3f365af4be1fe6a0ef77e751ff9b12c912810453/AF_XDP-interaction/af_xdp_user.c#L423-L424
> [2] https://www.kernel.org/doc/html/v6.12/networking/xsk-tx-metadata.html
> 
> --Jesper


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling
  2025-07-04 11:38         ` Daniel Borkmann
@ 2025-07-07 15:03           ` Stanislav Fomichev
  2025-07-09 14:00             ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2025-07-07 15:03 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jesper Dangaard Brouer, Song, Yoong Siang, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Alexei Starovoitov, John Fastabend,
	Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, Mykola Lysenko, Shuah Khan, Magnus Karlsson,
	Björn Töpel, Maciej Fijalkowski, Jonathan Lemon,
	netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org

On 07/04, Daniel Borkmann wrote:
> On 7/4/25 11:58 AM, Jesper Dangaard Brouer wrote:
> > On 04/07/2025 03.17, Song, Yoong Siang wrote:
> > > On Friday, July 4, 2025 1:05 AM, Jesper Dangaard Brouer <hawk@kernel.org> wrote:
> > > > On 02/07/2025 18.57, Song Yoong Siang wrote:
> > > > > Introduce the XDP_METADATA_SIZE macro as a conservative measure to
> > > > > accommodate any metadata areas reserved by Ethernet devices.
> > > > 
> > > > This seems like a sloppy workaround :-(
> > > > 
> > > > To me, the problem arise because AF_XDP is lacking the ability to
> > > > communicate the size of the data_meta area.  If we had this capability,
> > > > then we could allow the IGC driver to take some of the space, have the
> > > > BPF-prog expand it futher (bpf_xdp_adjust_meta) and then userspace
> > > > AF_XDP would simply be able to see the size of the data_meta area, and
> > > > apply the struct xdp_meta at right offset.
> > > > 
> > > Thanks for your input.
> > > 
> > > I agree with you that the implementation will be simple if user application
> > > able to get the size of data_meta area. The intention of this patch set is to let
> > > developer aware of such limitations before we have a perfect solution.
> > > 
> > > Btw, do you got any suggestion on how to expose the metadata length?
> > > I not sure whether xdp_desc.options is a simple and good idea or not?
> > 
> > That is a question to the AF_XDP maintainers... added them to this email.
> > 
> > /* Rx/Tx descriptor */
> > struct xdp_desc {
> >      __u64 addr;
> >      __u32 len;
> >      __u32 options;
> > };
> > 
> > As far as I know, the xdp_desc.options field isn't used, right?
> 
> The options holds flags, see also XDP_PKT_CONTD and XDP_TX_METADATA.
> 
> > (Please AF_XDP experts, please verify below statements:)
> > Something else we likely want to document: The available headroom in the
> > AF_XDP frame.  When accessing the metadata in userspace AF_XDP we do a
> > negative offset from the UMEM packet pointer.  IIRC on RX the available
> > headroom will be either 255 or 192 bytes (depending on NIC drivers).
> > 
> > Slightly confusing when AF_XDP transmitting from userspace the UMEM
> > headroom is default zero (XSK_UMEM__DEFAULT_FRAME_HEADROOM is zero).
> > This is configurable via xsk_umem_config.frame_headroom, like I did in
> > this example[1].
> > 
> > Maybe I did something wrong in[1], because I see that the new method is
> > setting xsk_umem_config.tx_metadata_len + flag XDP_UMEM_TX_METADATA_LEN.
> > This is nicely documented in [2]. How does this interact with setting
> > xsk_umem_config.frame_headroom ?
> 
> If you request XDP_UMEM_TX_METADATA_LEN then on TX side you can fill
> struct xsk_tx_metadata before the start of packet data, that is,
> meta = data - sizeof(struct xsk_tx_metadata). The validity of the
> latter is indicated via desc->options |= XDP_TX_METADATA and then
> you fill meta->flags with things like XDP_TXMD_FLAGS_CHECKSUM to
> tell that the related fields are valid (ex. request.csum_start,
> request.csum_offset) and that you expect the driver to do the
> offload with this info. This is also what I mentioned in the other
> thread some time ago that imho it would make sense to have this also
> on RX side somewhat similar to virtio_net_hdr..

Let's at least document the current behavior where some (small minority of)
drivers can reuse the rx metadata area for some of its state? If we want
to improve on that by adding another knob, we can follow up?
(but I remember last time it was discussed, about a year ago, people
were not enthusiastic about another parameter exported as uapi)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling
  2025-07-07 15:03           ` Stanislav Fomichev
@ 2025-07-09 14:00             ` Daniel Borkmann
  2025-07-09 16:29               ` Stanislav Fomichev
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2025-07-09 14:00 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Jesper Dangaard Brouer, Song, Yoong Siang, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Alexei Starovoitov, John Fastabend,
	Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, Mykola Lysenko, Shuah Khan, Magnus Karlsson,
	Björn Töpel, Maciej Fijalkowski, Jonathan Lemon,
	netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org

On 7/7/25 5:03 PM, Stanislav Fomichev wrote:
> On 07/04, Daniel Borkmann wrote:
>> On 7/4/25 11:58 AM, Jesper Dangaard Brouer wrote:
>>> On 04/07/2025 03.17, Song, Yoong Siang wrote:
>>>> On Friday, July 4, 2025 1:05 AM, Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>>>>> On 02/07/2025 18.57, Song Yoong Siang wrote:
>>>>>> Introduce the XDP_METADATA_SIZE macro as a conservative measure to
>>>>>> accommodate any metadata areas reserved by Ethernet devices.
>>>>>
>>>>> This seems like a sloppy workaround :-(
>>>>>
>>>>> To me, the problem arise because AF_XDP is lacking the ability to
>>>>> communicate the size of the data_meta area.  If we had this capability,
>>>>> then we could allow the IGC driver to take some of the space, have the
>>>>> BPF-prog expand it futher (bpf_xdp_adjust_meta) and then userspace
>>>>> AF_XDP would simply be able to see the size of the data_meta area, and
>>>>> apply the struct xdp_meta at right offset.
>>>>>
>>>> Thanks for your input.
>>>>
>>>> I agree with you that the implementation will be simple if user application
>>>> able to get the size of data_meta area. The intention of this patch set is to let
>>>> developer aware of such limitations before we have a perfect solution.
>>>>
>>>> Btw, do you got any suggestion on how to expose the metadata length?
>>>> I not sure whether xdp_desc.options is a simple and good idea or not?
>>>
>>> That is a question to the AF_XDP maintainers... added them to this email.
>>>
>>> /* Rx/Tx descriptor */
>>> struct xdp_desc {
>>>       __u64 addr;
>>>       __u32 len;
>>>       __u32 options;
>>> };
>>>
>>> As far as I know, the xdp_desc.options field isn't used, right?
>>
>> The options holds flags, see also XDP_PKT_CONTD and XDP_TX_METADATA.
>>
>>> (Please AF_XDP experts, please verify below statements:)
>>> Something else we likely want to document: The available headroom in the
>>> AF_XDP frame.  When accessing the metadata in userspace AF_XDP we do a
>>> negative offset from the UMEM packet pointer.  IIRC on RX the available
>>> headroom will be either 255 or 192 bytes (depending on NIC drivers).
>>>
>>> Slightly confusing when AF_XDP transmitting from userspace the UMEM
>>> headroom is default zero (XSK_UMEM__DEFAULT_FRAME_HEADROOM is zero).
>>> This is configurable via xsk_umem_config.frame_headroom, like I did in
>>> this example[1].
>>>
>>> Maybe I did something wrong in[1], because I see that the new method is
>>> setting xsk_umem_config.tx_metadata_len + flag XDP_UMEM_TX_METADATA_LEN.
>>> This is nicely documented in [2]. How does this interact with setting
>>> xsk_umem_config.frame_headroom ?
>>
>> If you request XDP_UMEM_TX_METADATA_LEN then on TX side you can fill
>> struct xsk_tx_metadata before the start of packet data, that is,
>> meta = data - sizeof(struct xsk_tx_metadata). The validity of the
>> latter is indicated via desc->options |= XDP_TX_METADATA and then
>> you fill meta->flags with things like XDP_TXMD_FLAGS_CHECKSUM to
>> tell that the related fields are valid (ex. request.csum_start,
>> request.csum_offset) and that you expect the driver to do the
>> offload with this info. This is also what I mentioned in the other
>> thread some time ago that imho it would make sense to have this also
>> on RX side somewhat similar to virtio_net_hdr..
> 
> Let's at least document the current behavior where some (small minority of)
> drivers can reuse the rx metadata area for some of its state? If we want
> to improve on that by adding another knob, we can follow up?
> (but I remember last time it was discussed, about a year ago, people
> were not enthusiastic about another parameter exported as uapi)

But its still fundamentally broken no? Unless there is no harm for BPF devs
to override that rx metadata area when the pkt later on goes up the stack, but
it sounds this is not the case here. Iiuc, Yoong is trying a different approach
now to prepend before data_hard_start [0]? Then if BPF prog needs it, igc
already implements xmo_rx_timestamp callback which can copy it from there.

   [0] https://lore.kernel.org/bpf/20250707191742.662aeffb@kernel.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling
  2025-07-09 14:00             ` Daniel Borkmann
@ 2025-07-09 16:29               ` Stanislav Fomichev
  2025-07-10 15:35                 ` Song, Yoong Siang
  0 siblings, 1 reply; 15+ messages in thread
From: Stanislav Fomichev @ 2025-07-09 16:29 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jesper Dangaard Brouer, Song, Yoong Siang, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Alexei Starovoitov, John Fastabend,
	Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, Mykola Lysenko, Shuah Khan, Magnus Karlsson,
	Björn Töpel, Maciej Fijalkowski, Jonathan Lemon,
	netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org

On 07/09, Daniel Borkmann wrote:
> On 7/7/25 5:03 PM, Stanislav Fomichev wrote:
> > On 07/04, Daniel Borkmann wrote:
> > > On 7/4/25 11:58 AM, Jesper Dangaard Brouer wrote:
> > > > On 04/07/2025 03.17, Song, Yoong Siang wrote:
> > > > > On Friday, July 4, 2025 1:05 AM, Jesper Dangaard Brouer <hawk@kernel.org> wrote:
> > > > > > On 02/07/2025 18.57, Song Yoong Siang wrote:
> > > > > > > Introduce the XDP_METADATA_SIZE macro as a conservative measure to
> > > > > > > accommodate any metadata areas reserved by Ethernet devices.
> > > > > > 
> > > > > > This seems like a sloppy workaround :-(
> > > > > > 
> > > > > > To me, the problem arise because AF_XDP is lacking the ability to
> > > > > > communicate the size of the data_meta area.  If we had this capability,
> > > > > > then we could allow the IGC driver to take some of the space, have the
> > > > > > BPF-prog expand it futher (bpf_xdp_adjust_meta) and then userspace
> > > > > > AF_XDP would simply be able to see the size of the data_meta area, and
> > > > > > apply the struct xdp_meta at right offset.
> > > > > > 
> > > > > Thanks for your input.
> > > > > 
> > > > > I agree with you that the implementation will be simple if user application
> > > > > able to get the size of data_meta area. The intention of this patch set is to let
> > > > > developer aware of such limitations before we have a perfect solution.
> > > > > 
> > > > > Btw, do you got any suggestion on how to expose the metadata length?
> > > > > I not sure whether xdp_desc.options is a simple and good idea or not?
> > > > 
> > > > That is a question to the AF_XDP maintainers... added them to this email.
> > > > 
> > > > /* Rx/Tx descriptor */
> > > > struct xdp_desc {
> > > >       __u64 addr;
> > > >       __u32 len;
> > > >       __u32 options;
> > > > };
> > > > 
> > > > As far as I know, the xdp_desc.options field isn't used, right?
> > > 
> > > The options holds flags, see also XDP_PKT_CONTD and XDP_TX_METADATA.
> > > 
> > > > (Please AF_XDP experts, please verify below statements:)
> > > > Something else we likely want to document: The available headroom in the
> > > > AF_XDP frame.  When accessing the metadata in userspace AF_XDP we do a
> > > > negative offset from the UMEM packet pointer.  IIRC on RX the available
> > > > headroom will be either 255 or 192 bytes (depending on NIC drivers).
> > > > 
> > > > Slightly confusing when AF_XDP transmitting from userspace the UMEM
> > > > headroom is default zero (XSK_UMEM__DEFAULT_FRAME_HEADROOM is zero).
> > > > This is configurable via xsk_umem_config.frame_headroom, like I did in
> > > > this example[1].
> > > > 
> > > > Maybe I did something wrong in[1], because I see that the new method is
> > > > setting xsk_umem_config.tx_metadata_len + flag XDP_UMEM_TX_METADATA_LEN.
> > > > This is nicely documented in [2]. How does this interact with setting
> > > > xsk_umem_config.frame_headroom ?
> > > 
> > > If you request XDP_UMEM_TX_METADATA_LEN then on TX side you can fill
> > > struct xsk_tx_metadata before the start of packet data, that is,
> > > meta = data - sizeof(struct xsk_tx_metadata). The validity of the
> > > latter is indicated via desc->options |= XDP_TX_METADATA and then
> > > you fill meta->flags with things like XDP_TXMD_FLAGS_CHECKSUM to
> > > tell that the related fields are valid (ex. request.csum_start,
> > > request.csum_offset) and that you expect the driver to do the
> > > offload with this info. This is also what I mentioned in the other
> > > thread some time ago that imho it would make sense to have this also
> > > on RX side somewhat similar to virtio_net_hdr..
> > 
> > Let's at least document the current behavior where some (small minority of)
> > drivers can reuse the rx metadata area for some of its state? If we want
> > to improve on that by adding another knob, we can follow up?
> > (but I remember last time it was discussed, about a year ago, people
> > were not enthusiastic about another parameter exported as uapi)
> 
> But its still fundamentally broken no? Unless there is no harm for BPF devs
> to override that rx metadata area when the pkt later on goes up the stack, but
> it sounds this is not the case here. Iiuc, Yoong is trying a different approach
> now to prepend before data_hard_start [0]? Then if BPF prog needs it, igc
> already implements xmo_rx_timestamp callback which can copy it from there.
> 
>   [0] https://lore.kernel.org/bpf/20250707191742.662aeffb@kernel.org/

True, Jakub mentioned the same thread to me. This is, indeed, a better
idea!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling
  2025-07-09 16:29               ` Stanislav Fomichev
@ 2025-07-10 15:35                 ` Song, Yoong Siang
  2025-07-10 17:28                   ` Jakub Kicinski
  0 siblings, 1 reply; 15+ messages in thread
From: Song, Yoong Siang @ 2025-07-10 15:35 UTC (permalink / raw)
  To: Stanislav Fomichev, Daniel Borkmann
  Cc: Jesper Dangaard Brouer, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Alexei Starovoitov, John Fastabend, Stanislav Fomichev,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, Magnus Karlsson, Björn Töpel,
	Fijalkowski, Maciej, Jonathan Lemon, netdev@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org

On Thursday, July 10, 2025 12:29 AM, Stanislav Fomichev <stfomichev@gmail.com> wrote:
>On 07/09, Daniel Borkmann wrote:
>> On 7/7/25 5:03 PM, Stanislav Fomichev wrote:
>> > On 07/04, Daniel Borkmann wrote:
>> > > On 7/4/25 11:58 AM, Jesper Dangaard Brouer wrote:
>> > > > On 04/07/2025 03.17, Song, Yoong Siang wrote:
>> > > > > On Friday, July 4, 2025 1:05 AM, Jesper Dangaard Brouer
><hawk@kernel.org> wrote:
>> > > > > > On 02/07/2025 18.57, Song Yoong Siang wrote:
>> > > > > > > Introduce the XDP_METADATA_SIZE macro as a conservative measure to
>> > > > > > > accommodate any metadata areas reserved by Ethernet devices.
>> > > > > >
>> > > > > > This seems like a sloppy workaround :-(
>> > > > > >
>> > > > > > To me, the problem arise because AF_XDP is lacking the ability to
>> > > > > > communicate the size of the data_meta area.  If we had this capability,
>> > > > > > then we could allow the IGC driver to take some of the space, have the
>> > > > > > BPF-prog expand it futher (bpf_xdp_adjust_meta) and then userspace
>> > > > > > AF_XDP would simply be able to see the size of the data_meta area, and
>> > > > > > apply the struct xdp_meta at right offset.
>> > > > > >
>> > > > > Thanks for your input.
>> > > > >
>> > > > > I agree with you that the implementation will be simple if user application
>> > > > > able to get the size of data_meta area. The intention of this patch set is to let
>> > > > > developer aware of such limitations before we have a perfect solution.
>> > > > >
>> > > > > Btw, do you got any suggestion on how to expose the metadata length?
>> > > > > I not sure whether xdp_desc.options is a simple and good idea or not?
>> > > >
>> > > > That is a question to the AF_XDP maintainers... added them to this email.
>> > > >
>> > > > /* Rx/Tx descriptor */
>> > > > struct xdp_desc {
>> > > >       __u64 addr;
>> > > >       __u32 len;
>> > > >       __u32 options;
>> > > > };
>> > > >
>> > > > As far as I know, the xdp_desc.options field isn't used, right?
>> > >
>> > > The options holds flags, see also XDP_PKT_CONTD and XDP_TX_METADATA.
>> > >
>> > > > (Please AF_XDP experts, please verify below statements:)
>> > > > Something else we likely want to document: The available headroom in the
>> > > > AF_XDP frame.  When accessing the metadata in userspace AF_XDP we do a
>> > > > negative offset from the UMEM packet pointer.  IIRC on RX the available
>> > > > headroom will be either 255 or 192 bytes (depending on NIC drivers).
>> > > >
>> > > > Slightly confusing when AF_XDP transmitting from userspace the UMEM
>> > > > headroom is default zero (XSK_UMEM__DEFAULT_FRAME_HEADROOM is
>zero).
>> > > > This is configurable via xsk_umem_config.frame_headroom, like I did in
>> > > > this example[1].
>> > > >
>> > > > Maybe I did something wrong in[1], because I see that the new method is
>> > > > setting xsk_umem_config.tx_metadata_len + flag XDP_UMEM_TX_METADATA_LEN.
>> > > > This is nicely documented in [2]. How does this interact with setting
>> > > > xsk_umem_config.frame_headroom ?
>> > >
>> > > If you request XDP_UMEM_TX_METADATA_LEN then on TX side you can fill
>> > > struct xsk_tx_metadata before the start of packet data, that is,
>> > > meta = data - sizeof(struct xsk_tx_metadata). The validity of the
>> > > latter is indicated via desc->options |= XDP_TX_METADATA and then
>> > > you fill meta->flags with things like XDP_TXMD_FLAGS_CHECKSUM to
>> > > tell that the related fields are valid (ex. request.csum_start,
>> > > request.csum_offset) and that you expect the driver to do the
>> > > offload with this info. This is also what I mentioned in the other
>> > > thread some time ago that imho it would make sense to have this also
>> > > on RX side somewhat similar to virtio_net_hdr..
>> >
>> > Let's at least document the current behavior where some (small minority of)
>> > drivers can reuse the rx metadata area for some of its state? If we want
>> > to improve on that by adding another knob, we can follow up?
>> > (but I remember last time it was discussed, about a year ago, people
>> > were not enthusiastic about another parameter exported as uapi)
>>
>> But its still fundamentally broken no? Unless there is no harm for BPF devs
>> to override that rx metadata area when the pkt later on goes up the stack, but
>> it sounds this is not the case here. Iiuc, Yoong is trying a different approach
>> now to prepend before data_hard_start [0]?

I plan to retrieve the timestamp from metadata area and put it in xdp_buff_xsk.cb
area via struct igc_xdp_buff.

>> Then if BPF prog needs it, igc
>> already implements xmo_rx_timestamp callback which can copy it from there.

>>
>>   [0] https://lore.kernel.org/bpf/20250707191742.662aeffb@kernel.org/
>
>True, Jakub mentioned the same thread to me. This is, indeed, a better
>idea!

Would it be advisable to update the documentation to indicate that
drivers are expected to copy any device-reserved metadata from the
metadata area? This would ensure that xdp_buff->data_meta is equal
to xdp_buff->data before a BPF program is executed. This approach
would allow BPF programs to freely manipulate the metadata area
in XDP_REDIRECT scenarios.

Additionally, I am uncertain about the need to overriding metadata in
XDP_PASS scenarios. Should BPF programs refrain from overriding the
metadata in this case?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling
  2025-07-10 15:35                 ` Song, Yoong Siang
@ 2025-07-10 17:28                   ` Jakub Kicinski
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2025-07-10 17:28 UTC (permalink / raw)
  To: Song, Yoong Siang
  Cc: Stanislav Fomichev, Daniel Borkmann, Jesper Dangaard Brouer,
	David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Alexei Starovoitov, John Fastabend,
	Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, Mykola Lysenko, Shuah Khan, Magnus Karlsson,
	Björn Töpel, Fijalkowski, Maciej, Jonathan Lemon,
	netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org

On Thu, 10 Jul 2025 15:35:32 +0000 Song, Yoong Siang wrote:
> Would it be advisable to update the documentation to indicate that
> drivers are expected to copy any device-reserved metadata from the
> metadata area? This would ensure that xdp_buff->data_meta is equal
> to xdp_buff->data before a BPF program is executed. This approach
> would allow BPF programs to freely manipulate the metadata area
> in XDP_REDIRECT scenarios.

Documenting sounds good.

> Additionally, I am uncertain about the need to overriding metadata in
> XDP_PASS scenarios. Should BPF programs refrain from overriding the
> metadata in this case?

IIRC XDP_PASS was the initial use case for the metadata area.
The driver needs to evacuate any HW metadata before handing over
to the XDP program.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-07-10 17:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 16:57 [PATCH bpf-next,v3 0/2] Clarify and Enhance XDP Rx Metadata Handling Song Yoong Siang
2025-07-02 16:57 ` [PATCH bpf-next,v3 1/2] doc: enhance explanation of XDP Rx metadata layout and METADATA_SIZE Song Yoong Siang
2025-07-03 15:57   ` Daniel Borkmann
2025-07-04  0:33     ` Song, Yoong Siang
2025-07-02 16:57 ` [PATCH bpf-next,v3 2/2] selftests/bpf: Enhance XDP Rx metadata handling Song Yoong Siang
2025-07-03 17:04   ` Jesper Dangaard Brouer
2025-07-04  1:17     ` Song, Yoong Siang
2025-07-04  9:58       ` Jesper Dangaard Brouer
2025-07-04 11:38         ` Daniel Borkmann
2025-07-07 15:03           ` Stanislav Fomichev
2025-07-09 14:00             ` Daniel Borkmann
2025-07-09 16:29               ` Stanislav Fomichev
2025-07-10 15:35                 ` Song, Yoong Siang
2025-07-10 17:28                   ` Jakub Kicinski
2025-07-03 15:41 ` [PATCH bpf-next,v3 0/2] Clarify and Enhance XDP Rx Metadata Handling Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).