* [PATCH bpf-next v3 0/3] xsk: TX metadata Launch Time support
@ 2023-12-03 16:51 Song Yoong Siang
  2023-12-03 16:51 ` [PATCH bpf-next v3 1/3] xsk: add Launch Time HW offload to XDP Tx metadata Song Yoong Siang
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Song Yoong Siang @ 2023-12-03 16:51 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Bjorn Topel, Magnus Karlsson, Maciej Fijalkowski,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Lorenzo Bianconi, Tariq Toukan, Willem de Bruijn, Maxime Coquelin,
	Andrii Nakryiko, Mykola Lysenko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
	Alexandre Torgue, Jose Abreu
  Cc: netdev, linux-kernel, linux-doc, bpf, xdp-hints, linux-stm32,
	linux-arm-kernel, linux-kselftest, Song Yoong Siang
This series expands XDP TX metadata framework to include ETF HW offload.
Changes since v1: (Willem)
- rename Time-Based Scheduling (TBS) to Earliest TxTime First (ETF)
- rename launch-time to txtime
Changes since v2: (Jesper & Willem)
- rename to use launch time
- change the default launch time in xdp_hw_metadata apps from 1s to 0.1s
  because some NICs do not support such a large future time.
v1: https://patchwork.kernel.org/project/netdevbpf/cover/20231130162028.852006-1-yoong.siang.song@intel.com/
v2: https://patchwork.kernel.org/project/netdevbpf/cover/20231201062421.1074768-1-yoong.siang.song@intel.com/
Song Yoong Siang (3):
  xsk: add Launch Time HW offload to XDP Tx metadata
  net: stmmac: add Launch Time support to XDP ZC
  selftests/bpf: add Launch Time request to xdp_hw_metadata
 Documentation/netlink/specs/netdev.yaml       |  4 ++++
 Documentation/networking/xsk-tx-metadata.rst  |  4 ++++
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |  2 ++
 .../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 +++++++++++++
 include/net/xdp_sock.h                        | 10 ++++++++++
 include/net/xdp_sock_drv.h                    |  1 +
 include/uapi/linux/if_xdp.h                   |  9 +++++++++
 include/uapi/linux/netdev.h                   |  3 +++
 net/core/netdev-genl.c                        |  2 ++
 net/xdp/xsk.c                                 |  3 +++
 tools/include/uapi/linux/if_xdp.h             |  9 +++++++++
 tools/include/uapi/linux/netdev.h             |  3 +++
 tools/net/ynl/generated/netdev-user.c         |  1 +
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 19 ++++++++++++++++++-
 14 files changed, 82 insertions(+), 1 deletion(-)
-- 
2.34.1
^ permalink raw reply	[flat|nested] 19+ messages in thread
* [PATCH bpf-next v3 1/3] xsk: add Launch Time HW offload to XDP Tx metadata
  2023-12-03 16:51 [PATCH bpf-next v3 0/3] xsk: TX metadata Launch Time support Song Yoong Siang
@ 2023-12-03 16:51 ` Song Yoong Siang
  2023-12-03 16:51 ` [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC Song Yoong Siang
  2023-12-03 16:51 ` [PATCH bpf-next v3 3/3] selftests/bpf: add Launch Time request to xdp_hw_metadata Song Yoong Siang
  2 siblings, 0 replies; 19+ messages in thread
From: Song Yoong Siang @ 2023-12-03 16:51 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Bjorn Topel, Magnus Karlsson, Maciej Fijalkowski,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Lorenzo Bianconi, Tariq Toukan, Willem de Bruijn, Maxime Coquelin,
	Andrii Nakryiko, Mykola Lysenko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
	Alexandre Torgue, Jose Abreu
  Cc: netdev, linux-kernel, linux-doc, bpf, xdp-hints, linux-stm32,
	linux-arm-kernel, linux-kselftest, Song Yoong Siang
This patch extends the XDP Tx metadata framework so that user can requests
Launch Time hardware offload, where the NIC will schedule the packet for
transmission at a pre-determined time called launch time. The value of
launch time is communicated from user space to Ethernet driver via
launch_time field of struct xsk_tx_metadata.
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
---
 Documentation/netlink/specs/netdev.yaml      |  4 ++++
 Documentation/networking/xsk-tx-metadata.rst |  4 ++++
 include/net/xdp_sock.h                       | 10 ++++++++++
 include/net/xdp_sock_drv.h                   |  1 +
 include/uapi/linux/if_xdp.h                  |  9 +++++++++
 include/uapi/linux/netdev.h                  |  3 +++
 net/core/netdev-genl.c                       |  2 ++
 net/xdp/xsk.c                                |  3 +++
 tools/include/uapi/linux/if_xdp.h            |  9 +++++++++
 tools/include/uapi/linux/netdev.h            |  3 +++
 tools/net/ynl/generated/netdev-user.c        |  1 +
 11 files changed, 49 insertions(+)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index eef6358ec587..a8bec7ddbdb0 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -66,6 +66,10 @@ definitions:
         name: tx-checksum
         doc:
           L3 checksum HW offload is supported by the driver.
+      -
+        name: launch-time
+        doc:
+          Launch Time HW offload is supported by the driver.
 
 attribute-sets:
   -
diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst
index 97ecfa480d00..6d53e191d4ae 100644
--- a/Documentation/networking/xsk-tx-metadata.rst
+++ b/Documentation/networking/xsk-tx-metadata.rst
@@ -44,6 +44,9 @@ The flags field enables the particular offload:
   checksum. ``csum_start`` specifies byte offset of where the checksumming
   should start and ``csum_offset`` specifies byte offset where the
   device should store the computed checksum.
+- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests Launch Time HW offload to
+  launch the packet at a pre-determined time. ``launch_time`` indicates
+  the time which the NIC should schedule the packet for transmission.
 
 Besides the flags above, in order to trigger the offloads, the first
 packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
@@ -68,6 +71,7 @@ Refer to ``xsk-flags`` features bitmask in
 
 - ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP``
 - ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM``
+- ``launch-time``: device supports ``XDP_TXMD_FLAGS_LAUNCH_TIME``
 
 See ``tools/net/ynl/samples/netdev.c`` on how to query this information.
 
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 3cb4dc9bd70e..516baa8a3b1f 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -110,11 +110,16 @@ struct xdp_sock {
  *     indicates position where checksumming should start.
  *     csum_offset indicates position where checksum should be stored.
  *
+ * void (*tmo_request_launch_time)(u64 launch_time, void *priv)
+ *     Called when AF_XDP frame requested Launch Time HW offload support.
+ *     launch_time indicates the time which the NIC should schedule the packet for
+ *     transmission.
  */
 struct xsk_tx_metadata_ops {
 	void	(*tmo_request_timestamp)(void *priv);
 	u64	(*tmo_fill_timestamp)(void *priv);
 	void	(*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv);
+	void	(*tmo_request_launch_time)(u64 launch_time, void *priv);
 };
 
 #ifdef CONFIG_XDP_SOCKETS
@@ -170,6 +175,11 @@ static inline void xsk_tx_metadata_request(const struct xsk_tx_metadata *meta,
 		if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM)
 			ops->tmo_request_checksum(meta->request.csum_start,
 						  meta->request.csum_offset, priv);
+
+	if (ops->tmo_request_launch_time)
+		if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME)
+			ops->tmo_request_launch_time(meta->request.launch_time,
+						     priv);
 }
 
 /**
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 81e02de3f453..5b88559e956b 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -168,6 +168,7 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 #define XDP_TXMD_FLAGS_VALID ( \
 		XDP_TXMD_FLAGS_TIMESTAMP | \
 		XDP_TXMD_FLAGS_CHECKSUM | \
+		XDP_TXMD_FLAGS_LAUNCH_TIME | \
 	0)
 
 static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta)
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index d31698410410..e81bed455ec1 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -123,6 +123,12 @@ struct xdp_options {
  */
 #define XDP_TXMD_FLAGS_CHECKSUM			(1 << 1)
 
+/* Request Launch Time HW offload to launch the packet at a pre-determined time.
+ * The time which the NIC should schedule the packet for transmission is
+ * communicated via launch_time field of struct xsk_tx_metadata.
+ */
+#define XDP_TXMD_FLAGS_LAUNCH_TIME		(1 << 2)
+
 /* AF_XDP offloads request. 'request' union member is consumed by the driver
  * when the packet is being transmitted. 'completion' union member is
  * filled by the driver when the transmit completion arrives.
@@ -138,6 +144,9 @@ struct xsk_tx_metadata {
 			__u16 csum_start;
 			/* Offset from csum_start where checksum should be stored. */
 			__u16 csum_offset;
+
+			/* XDP_TXMD_FLAGS_LAUNCH_TIME */
+			__u64 launch_time;
 		} request;
 
 		struct {
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 6244c0164976..b77baf1cffd7 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -56,10 +56,13 @@ enum netdev_xdp_rx_metadata {
  *   by the driver.
  * @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the
  *   driver.
+ * @NETDEV_XSK_FLAGS_LAUNCH_TIME: Launch Time HW offload is supported by the
+ *   driver.
  */
 enum netdev_xsk_flags {
 	NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1,
 	NETDEV_XSK_FLAGS_TX_CHECKSUM = 2,
+	NETDEV_XSK_FLAGS_LAUNCH_TIME = 3,
 };
 
 enum {
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 10f2124e9e23..c38521269b4e 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -33,6 +33,8 @@ XDP_METADATA_KFUNC_xxx
 			xsk_features |= NETDEV_XSK_FLAGS_TX_TIMESTAMP;
 		if (netdev->xsk_tx_metadata_ops->tmo_request_checksum)
 			xsk_features |= NETDEV_XSK_FLAGS_TX_CHECKSUM;
+		if (netdev->xsk_tx_metadata_ops->tmo_request_launch_time)
+			xsk_features |= NETDEV_XSK_FLAGS_LAUNCH_TIME;
 	}
 
 	if (nla_put_u32(rsp, NETDEV_A_DEV_IFINDEX, netdev->ifindex) ||
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 281d49b4fca4..ad98ac6adb43 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -751,6 +751,9 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 						goto free_err;
 				}
 			}
+
+			if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME)
+				skb->skb_mstamp_ns = meta->request.launch_time;
 		}
 	}
 
diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
index 638c606dfa74..b9c5eeef0bfb 100644
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -123,6 +123,12 @@ struct xdp_options {
  */
 #define XDP_TXMD_FLAGS_CHECKSUM			(1 << 1)
 
+/* Request Launch Time HW offload to launch the packet at a pre-determined time.
+ * The time which the NIC should schedule the packet for transmission is
+ * communicated via launch_time field of struct xsk_tx_metadata.
+ */
+#define XDP_TXMD_FLAGS_LAUNCH_TIME		(1 << 2)
+
 /* AF_XDP offloads request. 'request' union member is consumed by the driver
  * when the packet is being transmitted. 'completion' union member is
  * filled by the driver when the transmit completion arrives.
@@ -138,6 +144,9 @@ struct xsk_tx_metadata {
 			__u16 csum_start;
 			/* Offset from csum_start where checksum should be stored. */
 			__u16 csum_offset;
+
+			/* XDP_TXMD_FLAGS_LAUNCH_TIME */
+			__u64 launch_time;
 		} request;
 
 		struct {
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index 6244c0164976..b77baf1cffd7 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -56,10 +56,13 @@ enum netdev_xdp_rx_metadata {
  *   by the driver.
  * @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the
  *   driver.
+ * @NETDEV_XSK_FLAGS_LAUNCH_TIME: Launch Time HW offload is supported by the
+ *   driver.
  */
 enum netdev_xsk_flags {
 	NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1,
 	NETDEV_XSK_FLAGS_TX_CHECKSUM = 2,
+	NETDEV_XSK_FLAGS_LAUNCH_TIME = 3,
 };
 
 enum {
diff --git a/tools/net/ynl/generated/netdev-user.c b/tools/net/ynl/generated/netdev-user.c
index 3b9dee94d4ce..5f20b23b64ba 100644
--- a/tools/net/ynl/generated/netdev-user.c
+++ b/tools/net/ynl/generated/netdev-user.c
@@ -66,6 +66,7 @@ const char *netdev_xdp_rx_metadata_str(enum netdev_xdp_rx_metadata value)
 static const char * const netdev_xsk_flags_strmap[] = {
 	[0] = "tx-timestamp",
 	[1] = "tx-checksum",
+	[2] = "launch-time"
 };
 
 const char *netdev_xsk_flags_str(enum netdev_xsk_flags value)
-- 
2.34.1
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-03 16:51 [PATCH bpf-next v3 0/3] xsk: TX metadata Launch Time support Song Yoong Siang
  2023-12-03 16:51 ` [PATCH bpf-next v3 1/3] xsk: add Launch Time HW offload to XDP Tx metadata Song Yoong Siang
@ 2023-12-03 16:51 ` Song Yoong Siang
  2023-12-04 10:36   ` Jesper Dangaard Brouer
  2023-12-03 16:51 ` [PATCH bpf-next v3 3/3] selftests/bpf: add Launch Time request to xdp_hw_metadata Song Yoong Siang
  2 siblings, 1 reply; 19+ messages in thread
From: Song Yoong Siang @ 2023-12-03 16:51 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Bjorn Topel, Magnus Karlsson, Maciej Fijalkowski,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Lorenzo Bianconi, Tariq Toukan, Willem de Bruijn, Maxime Coquelin,
	Andrii Nakryiko, Mykola Lysenko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
	Alexandre Torgue, Jose Abreu
  Cc: netdev, linux-kernel, linux-doc, bpf, xdp-hints, linux-stm32,
	linux-arm-kernel, linux-kselftest, Song Yoong Siang
This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
copy via XDP Tx metadata framework.
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 13 +++++++++++++
 2 files changed, 15 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 686c94c2e8a7..e8538af6e207 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -105,6 +105,8 @@ struct stmmac_metadata_request {
 	struct stmmac_priv *priv;
 	struct dma_desc *tx_desc;
 	bool *set_ic;
+	struct dma_edesc *edesc;
+	int tbs;
 };
 
 struct stmmac_xsk_tx_complete {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c2ac88aaffed..1fe80bfae24b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2465,9 +2465,20 @@ static u64 stmmac_xsk_fill_timestamp(void *_priv)
 	return 0;
 }
 
+static void stmmac_xsk_request_launch_time(u64 launch_time, void *_priv)
+{
+	struct stmmac_metadata_request *meta_req = _priv;
+	struct timespec64 ts = ns_to_timespec64(launch_time);
+
+	if (meta_req->tbs & STMMAC_TBS_EN)
+		stmmac_set_desc_tbs(meta_req->priv, meta_req->edesc, ts.tv_sec,
+				    ts.tv_nsec);
+}
+
 static const struct xsk_tx_metadata_ops stmmac_xsk_tx_metadata_ops = {
 	.tmo_request_timestamp		= stmmac_xsk_request_timestamp,
 	.tmo_fill_timestamp		= stmmac_xsk_fill_timestamp,
+	.tmo_request_launch_time	= stmmac_xsk_request_launch_time,
 };
 
 static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
@@ -2545,6 +2556,8 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
 		meta_req.priv = priv;
 		meta_req.tx_desc = tx_desc;
 		meta_req.set_ic = &set_ic;
+		meta_req.tbs = tx_q->tbs;
+		meta_req.edesc = &tx_q->dma_entx[entry];
 		xsk_tx_metadata_request(meta, &stmmac_xsk_tx_metadata_ops,
 					&meta_req);
 		if (set_ic) {
-- 
2.34.1
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* [PATCH bpf-next v3 3/3] selftests/bpf: add Launch Time request to xdp_hw_metadata
  2023-12-03 16:51 [PATCH bpf-next v3 0/3] xsk: TX metadata Launch Time support Song Yoong Siang
  2023-12-03 16:51 ` [PATCH bpf-next v3 1/3] xsk: add Launch Time HW offload to XDP Tx metadata Song Yoong Siang
  2023-12-03 16:51 ` [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC Song Yoong Siang
@ 2023-12-03 16:51 ` Song Yoong Siang
  2023-12-04 10:16   ` Jesper Dangaard Brouer
  2 siblings, 1 reply; 19+ messages in thread
From: Song Yoong Siang @ 2023-12-03 16:51 UTC (permalink / raw)
  To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Bjorn Topel, Magnus Karlsson, Maciej Fijalkowski,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Lorenzo Bianconi, Tariq Toukan, Willem de Bruijn, Maxime Coquelin,
	Andrii Nakryiko, Mykola Lysenko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
	Alexandre Torgue, Jose Abreu
  Cc: netdev, linux-kernel, linux-doc, bpf, xdp-hints, linux-stm32,
	linux-arm-kernel, linux-kselftest, Song Yoong Siang
This patch adds Launch Time hw offload request to xdp_hw_metadata. User can
configure the delta of HW launch time to HW RX-time by using "-l" argument.
The default delta is set to 0.1 second.
This patch is tested with stmmac on Intel Tiger Lake platform. Refer to
result below, the delta between pre-determined launch time and actual HW
transmit complete time is around 24 us.
$ sudo ./xdp_hw_metadata eth0
...
xsk_ring_cons__peek: 1
0x55e577c3a7a8: rx_desc[0]->addr=80100 addr=80100 comp_addr=80100 EoP
No rx_hash err=-95
HW RX-time:   1677762523393813392 (sec:1677762523.3938) delta to User RX-time sec:0.0003 (259.290 usec)
XDP RX-time:   1677762523394050576 (sec:1677762523.3941) delta to User RX-time sec:0.0000 (22.106 usec)
0x55e577c3a7a8: ping-pong with csum=5619 (want 8626) csum_start=34 csum_offset=6
HW RX-time:   1677762523393813392 (sec:1677762523.3938) delta to HW Launch-time sec:0.1000 (100000.000 usec)
0x55e577c3a7a8: complete tx idx=0 addr=18
HW Launch-time:   1677762523493813392 (sec:1677762523.4938) delta to HW TX-complete-time sec:0.0000 (24.181 usec)
HW TX-complete-time:   1677762523493837573 (sec:1677762523.4938) delta to User TX-complete-time sec:0.0007 (737.636 usec)
XDP RX-time:   1677762523394050576 (sec:1677762523.3941) delta to User TX-complete-time sec:0.1005 (100524.633 usec)
HW RX-time:   1677762523393813392 (sec:1677762523.3938) delta to HW TX-complete-time sec:0.1000 (100024.181 usec)
0x55e577c3a7a8: complete rx idx=128 addr=80100
$ sudo ./xdp_hw_metadata eth0 -l 10000000
...
poll: 1 (0) skip=17 fail=0 redir=17
xsk_ring_cons__peek: 1
0x558336d397a8: rx_desc[15]->addr=9e100 addr=9e100 comp_addr=9e100 EoP
No rx_hash err=-95
HW RX-time:   1677762699254666655 (sec:1677762699.2547) delta to User RX-time sec:0.0003 (256.928 usec)
XDP RX-time:   1677762699254901232 (sec:1677762699.2549) delta to User RX-time sec:0.0000 (22.351 usec)
0x558336d397a8: ping-pong with csum=5619 (want 8626) csum_start=34 csum_offset=6
HW RX-time:   1677762699254666655 (sec:1677762699.2547) delta to HW Launch-time sec:0.0100 (10000.000 usec)
0x558336d397a8: complete tx idx=15 addr=f018
HW Launch-time:   1677762699264666655 (sec:1677762699.2647) delta to HW TX-complete-time sec:0.0000 (24.307 usec)
HW TX-complete-time:   1677762699264690962 (sec:1677762699.2647) delta to User TX-complete-time sec:0.0003 (309.901 usec)
XDP RX-time:   1677762699254901232 (sec:1677762699.2549) delta to User TX-complete-time sec:0.0101 (10099.631 usec)
HW RX-time:   1677762699254666655 (sec:1677762699.2547) delta to HW TX-complete-time sec:0.0100 (10024.307 usec)
0x558336d397a8: complete rx idx=143 addr=9e100
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
---
 tools/testing/selftests/bpf/xdp_hw_metadata.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 3291625ba4fb..3e238bb310b7 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -13,6 +13,7 @@
  * - UDP 9091 packets trigger TX reply
  * - TX HW timestamp is requested and reported back upon completion
  * - TX checksum is requested
+ * - HW launch time is set for transmission
  */
 
 #include <test_progs.h>
@@ -61,6 +62,8 @@ int rxq;
 bool skip_tx;
 __u64 last_hw_rx_timestamp;
 __u64 last_xdp_rx_timestamp;
+__u64 last_launch_time;
+__u64 launch_time_delta_to_hw_rx_timestamp = 100000000; /* 0.1 second */
 
 void test__fail(void) { /* for network_helpers.c */ }
 
@@ -274,6 +277,8 @@ static bool complete_tx(struct xsk *xsk, clockid_t clock_id)
 	if (meta->completion.tx_timestamp) {
 		__u64 ref_tstamp = gettime(clock_id);
 
+		print_tstamp_delta("HW Launch-time", "HW TX-complete-time",
+				   last_launch_time, meta->completion.tx_timestamp);
 		print_tstamp_delta("HW TX-complete-time", "User TX-complete-time",
 				   meta->completion.tx_timestamp, ref_tstamp);
 		print_tstamp_delta("XDP RX-time", "User TX-complete-time",
@@ -371,6 +376,14 @@ static void ping_pong(struct xsk *xsk, void *rx_packet, clockid_t clock_id)
 	       xsk, ntohs(udph->check), ntohs(want_csum),
 	       meta->request.csum_start, meta->request.csum_offset);
 
+	/* Set the value of launch time */
+	meta->flags |= XDP_TXMD_FLAGS_LAUNCH_TIME;
+	meta->request.launch_time = last_hw_rx_timestamp +
+				    launch_time_delta_to_hw_rx_timestamp;
+	last_launch_time = meta->request.launch_time;
+	print_tstamp_delta("HW RX-time", "HW Launch-time", last_hw_rx_timestamp,
+			   meta->request.launch_time);
+
 	memcpy(data, rx_packet, len); /* don't share umem chunk for simplicity */
 	tx_desc->options |= XDP_TX_METADATA;
 	tx_desc->len = len;
@@ -595,6 +608,7 @@ static void print_usage(void)
 		"  -h    Display this help and exit\n\n"
 		"  -m    Enable multi-buffer XDP for larger MTU\n"
 		"  -r    Don't generate AF_XDP reply (rx metadata only)\n"
+		"  -l    Delta of HW Launch-time to HW RX-time in ns (default: 0.1s)\n"
 		"Generate test packets on the other machine with:\n"
 		"  echo -n xdp | nc -u -q1 <dst_ip> 9091\n";
 
@@ -605,7 +619,7 @@ static void read_args(int argc, char *argv[])
 {
 	int opt;
 
-	while ((opt = getopt(argc, argv, "chmr")) != -1) {
+	while ((opt = getopt(argc, argv, "chmrl:")) != -1) {
 		switch (opt) {
 		case 'c':
 			bind_flags &= ~XDP_USE_NEED_WAKEUP;
@@ -621,6 +635,9 @@ static void read_args(int argc, char *argv[])
 		case 'r':
 			skip_tx = true;
 			break;
+		case 'l':
+			launch_time_delta_to_hw_rx_timestamp = atoll(optarg);
+			break;
 		case '?':
 			if (isprint(optopt))
 				fprintf(stderr, "Unknown option: -%c\n", optopt);
-- 
2.34.1
^ permalink raw reply related	[flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next v3 3/3] selftests/bpf: add Launch Time request to xdp_hw_metadata
  2023-12-03 16:51 ` [PATCH bpf-next v3 3/3] selftests/bpf: add Launch Time request to xdp_hw_metadata Song Yoong Siang
@ 2023-12-04 10:16   ` Jesper Dangaard Brouer
  2023-12-04 14:59     ` Willem de Bruijn
  0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2023-12-04 10:16 UTC (permalink / raw)
  To: Song Yoong Siang, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
	Maciej Fijalkowski, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Stanislav Fomichev,
	Lorenzo Bianconi, Tariq Toukan, Willem de Bruijn, Maxime Coquelin,
	Andrii Nakryiko, Mykola Lysenko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
	Alexandre Torgue, Jose Abreu
  Cc: netdev, linux-kernel, linux-doc, bpf, xdp-hints, linux-stm32,
	linux-arm-kernel, linux-kselftest
On 12/3/23 17:51, Song Yoong Siang wrote:
> This patch is tested with stmmac on Intel Tiger Lake platform. Refer to
> result below, the delta between pre-determined launch time and actual HW
> transmit complete time is around 24 us.
Is there any NIC setup (e.g. ethtool/qdisc) requirements to enable HW 
for this feature?
--Jesper
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-03 16:51 ` [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC Song Yoong Siang
@ 2023-12-04 10:36   ` Jesper Dangaard Brouer
  2023-12-04 11:54     ` [xdp-hints] " Florian Bezdeka
  2023-12-04 14:54     ` Willem de Bruijn
  0 siblings, 2 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2023-12-04 10:36 UTC (permalink / raw)
  To: Song Yoong Siang, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Bjorn Topel, Magnus Karlsson,
	Maciej Fijalkowski, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Stanislav Fomichev,
	Lorenzo Bianconi, Tariq Toukan, Willem de Bruijn, Maxime Coquelin,
	Andrii Nakryiko, Mykola Lysenko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
	Alexandre Torgue, Jose Abreu
  Cc: netdev, linux-kernel, linux-doc, bpf, xdp-hints, linux-stm32,
	linux-arm-kernel, linux-kselftest
On 12/3/23 17:51, Song Yoong Siang wrote:
> This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
> copy via XDP Tx metadata framework.
> 
> Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
> ---
>   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
As requested before, I think we need to see another driver implementing 
this.
I propose driver igc and chip i225.
The interesting thing for me is to see how the LaunchTime max 1 second
into the future[1] is handled code wise. One suggestion is to add a 
section to Documentation/networking/xsk-tx-metadata.rst per driver that 
mentions/documents these different hardware limitations.  It is natural 
that different types of hardware have limitations.  This is a close-to 
hardware-level abstraction/API, and IMHO as long as we document the 
limitations we can expose this API without too many limitations for more 
capable hardware.
  [1] 
https://github.com/xdp-project/xdp-project/blob/master/areas/tsn/code01_follow_qdisc_TSN_offload.org#setup-code-driver-igb
This stmmac driver and Intel Tiger Lake CPU must also have some limit on 
how long into the future it will/can schedule packets?
People from xdp-hints list must make their voice hear if they want i210 
and igb driver support, because it have even-more hardware limitations, 
see [1] (E.g. only TX queue 0 and 1 supports LaunchTime). BUT I know 
some have this hardware in production and might be motivated to get a 
functioning driver with this feature?
--Jesper
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-04 10:36   ` Jesper Dangaard Brouer
@ 2023-12-04 11:54     ` Florian Bezdeka
  2023-12-04 14:54     ` Willem de Bruijn
  1 sibling, 0 replies; 19+ messages in thread
From: Florian Bezdeka @ 2023-12-04 11:54 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Song Yoong Siang, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jonathan Corbet,
	Bjorn Topel, Magnus Karlsson, Maciej Fijalkowski, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Stanislav Fomichev, Lorenzo Bianconi, Tariq Toukan,
	Willem de Bruijn, Maxime Coquelin, Andrii Nakryiko,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
	Jose Abreu
  Cc: netdev, linux-kernel, linux-doc, bpf, xdp-hints, linux-stm32,
	linux-arm-kernel, linux-kselftest
On Mon, 2023-12-04 at 11:36 +0100, Jesper Dangaard Brouer wrote:
> On 12/3/23 17:51, Song Yoong Siang wrote:
> > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
> > copy via XDP Tx metadata framework.
> > 
> > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
> > ---
> >   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
> 
> As requested before, I think we need to see another driver implementing 
> this.
> 
> I propose driver igc and chip i225.
igc support would be really nice and highly appreciated. There are a
lot of tests running here with that chip (i225/i226) / driver (igc)
combination. Let me know if we can support somehow, testing included.
> 
> The interesting thing for me is to see how the LaunchTime max 1 second
> into the future[1] is handled code wise. One suggestion is to add a 
> section to Documentation/networking/xsk-tx-metadata.rst per driver that 
> mentions/documents these different hardware limitations.  It is natural 
> that different types of hardware have limitations.  This is a close-to 
> hardware-level abstraction/API, and IMHO as long as we document the 
> limitations we can expose this API without too many limitations for more 
> capable hardware.
> 
>   [1] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/tsn/code01_follow_qdisc_TSN_offload.org#setup-code-driver-igb
> 
> This stmmac driver and Intel Tiger Lake CPU must also have some limit on 
> how long into the future it will/can schedule packets?
> 
> 
> People from xdp-hints list must make their voice hear if they want i210 
> and igb driver support, because it have even-more hardware limitations, 
> see [1] (E.g. only TX queue 0 and 1 supports LaunchTime). BUT I know 
> some have this hardware in production and might be motivated to get a 
> functioning driver with this feature?
i210 support would be nice, that would allow us to compare some test
setups with different NICs. In addition it would simplify some test
setups. For now, IMHO igc is more important.
> 
> --Jesper
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-04 10:36   ` Jesper Dangaard Brouer
  2023-12-04 11:54     ` [xdp-hints] " Florian Bezdeka
@ 2023-12-04 14:54     ` Willem de Bruijn
  2023-12-05 15:25       ` Song, Yoong Siang
  1 sibling, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2023-12-04 14:54 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Song Yoong Siang, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jonathan Corbet,
	Bjorn Topel, Magnus Karlsson, Maciej Fijalkowski, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Stanislav Fomichev, Lorenzo Bianconi, Tariq Toukan,
	Willem de Bruijn, Maxime Coquelin, Andrii Nakryiko,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
	Jose Abreu
  Cc: netdev, linux-kernel, linux-doc, bpf, xdp-hints, linux-stm32,
	linux-arm-kernel, linux-kselftest
Jesper Dangaard Brouer wrote:
> 
> 
> On 12/3/23 17:51, Song Yoong Siang wrote:
> > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
> > copy via XDP Tx metadata framework.
> > 
> > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
> > ---
> >   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
> 
> As requested before, I think we need to see another driver implementing 
> this.
> 
> I propose driver igc and chip i225.
> 
> The interesting thing for me is to see how the LaunchTime max 1 second
> into the future[1] is handled code wise. One suggestion is to add a 
> section to Documentation/networking/xsk-tx-metadata.rst per driver that 
> mentions/documents these different hardware limitations.  It is natural 
> that different types of hardware have limitations.  This is a close-to 
> hardware-level abstraction/API, and IMHO as long as we document the 
> limitations we can expose this API without too many limitations for more 
> capable hardware.
I would assume that the kfunc will fail when a value is passed that
cannot be programmed.
What is being implemented here already exists for qdiscs. The FQ
qdisc takes a horizon attribute and
    "
    when a packet is beyond the horizon
        at enqueue() time:
        - either drop the packet (default policy)
        - or cap its delivery time to the horizon.
    "
    commit 39d010504e6b ("net_sched: sch_fq: add horizon attribute")
Having the admin manually configure this on the qdisc based on
off-line knowledge of the device is more fragile than if the device
would somehow signal its limit to the stack.
But I don't think we should add enforcement of that as a requirement
for this xdp extension of pacing.
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next v3 3/3] selftests/bpf: add Launch Time request to xdp_hw_metadata
  2023-12-04 10:16   ` Jesper Dangaard Brouer
@ 2023-12-04 14:59     ` Willem de Bruijn
  2023-12-05 14:55       ` [xdp-hints] " Song, Yoong Siang
  0 siblings, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2023-12-04 14:59 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Song Yoong Siang, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jonathan Corbet,
	Bjorn Topel, Magnus Karlsson, Maciej Fijalkowski, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Stanislav Fomichev, Lorenzo Bianconi, Tariq Toukan,
	Willem de Bruijn, Maxime Coquelin, Andrii Nakryiko,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
	Jose Abreu
  Cc: netdev, linux-kernel, linux-doc, bpf, xdp-hints, linux-stm32,
	linux-arm-kernel, linux-kselftest
Jesper Dangaard Brouer wrote:
> 
> 
> On 12/3/23 17:51, Song Yoong Siang wrote:
> > This patch is tested with stmmac on Intel Tiger Lake platform. Refer to
> > result below, the delta between pre-determined launch time and actual HW
> > transmit complete time is around 24 us.
> 
> Is there any NIC setup (e.g. ethtool/qdisc) requirements to enable HW 
> for this feature?
Judging from how we currently use this with FQ and ETF, no.
See for instance tools/testing/selftests/net/so_txtime.sh
^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [xdp-hints] Re: [PATCH bpf-next v3 3/3] selftests/bpf: add Launch Time request to xdp_hw_metadata
  2023-12-04 14:59     ` Willem de Bruijn
@ 2023-12-05 14:55       ` Song, Yoong Siang
  0 siblings, 0 replies; 19+ messages in thread
From: Song, Yoong Siang @ 2023-12-05 14:55 UTC (permalink / raw)
  To: Willem de Bruijn, Jesper Dangaard Brouer, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jonathan Corbet,
	Bjorn Topel, Karlsson, Magnus, Fijalkowski, Maciej,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Stanislav Fomichev, Lorenzo Bianconi,
	Tariq Toukan, Willem de Bruijn, Maxime Coquelin, Andrii Nakryiko,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
	Jose Abreu
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, bpf@vger.kernel.org,
	xdp-hints@xdp-project.net,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kselftest@vger.kernel.org
On Monday, December 4, 2023 11:00 PM, Willem de Bruijn wrote:
>Jesper Dangaard Brouer wrote:
>>
>>
>> On 12/3/23 17:51, Song Yoong Siang wrote:
>> > This patch is tested with stmmac on Intel Tiger Lake platform. Refer to
>> > result below, the delta between pre-determined launch time and actual HW
>> > transmit complete time is around 24 us.
>>
>> Is there any NIC setup (e.g. ethtool/qdisc) requirements to enable HW
>> for this feature?
>
>Judging from how we currently use this with FQ and ETF, no.
>
>See for instance tools/testing/selftests/net/so_txtime.sh
For stmmac, we need to enable per queue launch time support
by using tc qdisc etf offload command.
I believe igc also using the same way.
I will add into documentation to make it clear that
XDP_TXMD_FLAGS_LAUNCH_TIME only used to pass the
launch time value, not enable launch time feature.
  
^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-04 14:54     ` Willem de Bruijn
@ 2023-12-05 15:25       ` Song, Yoong Siang
  2023-12-05 15:34         ` [xdp-hints] " Florian Bezdeka
  0 siblings, 1 reply; 19+ messages in thread
From: Song, Yoong Siang @ 2023-12-05 15:25 UTC (permalink / raw)
  To: Willem de Bruijn, Jesper Dangaard Brouer, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jonathan Corbet,
	Bjorn Topel, Karlsson, Magnus, Fijalkowski, Maciej,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Stanislav Fomichev, Lorenzo Bianconi,
	Tariq Toukan, Willem de Bruijn, Maxime Coquelin, Andrii Nakryiko,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
	Jose Abreu
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, bpf@vger.kernel.org,
	xdp-hints@xdp-project.net,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kselftest@vger.kernel.org
On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote:
>Jesper Dangaard Brouer wrote:
>>
>>
>> On 12/3/23 17:51, Song Yoong Siang wrote:
>> > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
>> > copy via XDP Tx metadata framework.
>> >
>> > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
>> > ---
>> >   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
>>
>> As requested before, I think we need to see another driver implementing
>> this.
>>
>> I propose driver igc and chip i225.
Sure. I will include igc patches in next version.
>>
>> The interesting thing for me is to see how the LaunchTime max 1 second
>> into the future[1] is handled code wise. One suggestion is to add a
>> section to Documentation/networking/xsk-tx-metadata.rst per driver that
>> mentions/documents these different hardware limitations.  It is natural
>> that different types of hardware have limitations.  This is a close-to
>> hardware-level abstraction/API, and IMHO as long as we document the
>> limitations we can expose this API without too many limitations for more
>> capable hardware.
Sure. I will try to add hardware limitations in documentation. 
>
>I would assume that the kfunc will fail when a value is passed that
>cannot be programmed.
>
In current design, the xsk_tx_metadata_request() dint got return value. 
So user won't know if their request is fail. 
It is complex to inform user which request is failing. 
Therefore, IMHO, it is good that we let driver handle the error silently. 
>What is being implemented here already exists for qdiscs. The FQ
>qdisc takes a horizon attribute and
>
>    "
>    when a packet is beyond the horizon
>        at enqueue() time:
>        - either drop the packet (default policy)
>        - or cap its delivery time to the horizon.
>    "
>    commit 39d010504e6b ("net_sched: sch_fq: add horizon attribute")
>
>Having the admin manually configure this on the qdisc based on
>off-line knowledge of the device is more fragile than if the device
>would somehow signal its limit to the stack.
>
>But I don't think we should add enforcement of that as a requirement
>for this xdp extension of pacing.
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-05 15:25       ` Song, Yoong Siang
@ 2023-12-05 15:34         ` Florian Bezdeka
  2023-12-05 17:13           ` Stanislav Fomichev
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Bezdeka @ 2023-12-05 15:34 UTC (permalink / raw)
  To: Song, Yoong Siang, Willem de Bruijn, Jesper Dangaard Brouer,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Bjorn Topel, Karlsson, Magnus,
	Fijalkowski, Maciej, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Stanislav Fomichev,
	Lorenzo Bianconi, Tariq Toukan, Willem de Bruijn, Maxime Coquelin,
	Andrii Nakryiko, Mykola Lysenko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
	Alexandre Torgue, Jose Abreu
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, bpf@vger.kernel.org,
	xdp-hints@xdp-project.net,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kselftest@vger.kernel.org
On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote:
> On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote:
> > Jesper Dangaard Brouer wrote:
> > > 
> > > 
> > > On 12/3/23 17:51, Song Yoong Siang wrote:
> > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
> > > > copy via XDP Tx metadata framework.
> > > > 
> > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
> > > > ---
> > > >   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
> > > 
> > > As requested before, I think we need to see another driver implementing
> > > this.
> > > 
> > > I propose driver igc and chip i225.
> 
> Sure. I will include igc patches in next version.
> 
> > > 
> > > The interesting thing for me is to see how the LaunchTime max 1 second
> > > into the future[1] is handled code wise. One suggestion is to add a
> > > section to Documentation/networking/xsk-tx-metadata.rst per driver that
> > > mentions/documents these different hardware limitations.  It is natural
> > > that different types of hardware have limitations.  This is a close-to
> > > hardware-level abstraction/API, and IMHO as long as we document the
> > > limitations we can expose this API without too many limitations for more
> > > capable hardware.
> 
> Sure. I will try to add hardware limitations in documentation. 
> 
> > 
> > I would assume that the kfunc will fail when a value is passed that
> > cannot be programmed.
> > 
> 
> In current design, the xsk_tx_metadata_request() dint got return value. 
> So user won't know if their request is fail. 
> It is complex to inform user which request is failing. 
> Therefore, IMHO, it is good that we let driver handle the error silently.
> 
If the programmed value is invalid, the packet will be "dropped" / will
never make it to the wire, right?
That is clearly a situation that the user should be informed about. For
RT systems this normally means that something is really wrong regarding
timing / cycle overflow. Such systems have to react on that situation.
>  
> 
> > What is being implemented here already exists for qdiscs. The FQ
> > qdisc takes a horizon attribute and
> > 
> >    "
> >    when a packet is beyond the horizon
> >        at enqueue() time:
> >        - either drop the packet (default policy)
> >        - or cap its delivery time to the horizon.
> >    "
> >    commit 39d010504e6b ("net_sched: sch_fq: add horizon attribute")
> > 
> > Having the admin manually configure this on the qdisc based on
> > off-line knowledge of the device is more fragile than if the device
> > would somehow signal its limit to the stack.
> > 
> > But I don't think we should add enforcement of that as a requirement
> > for this xdp extension of pacing.
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-05 15:34         ` [xdp-hints] " Florian Bezdeka
@ 2023-12-05 17:13           ` Stanislav Fomichev
  2023-12-05 18:03             ` Willem de Bruijn
  0 siblings, 1 reply; 19+ messages in thread
From: Stanislav Fomichev @ 2023-12-05 17:13 UTC (permalink / raw)
  To: Florian Bezdeka
  Cc: Song, Yoong Siang, Willem de Bruijn, Jesper Dangaard Brouer,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Bjorn Topel, Karlsson, Magnus,
	Fijalkowski, Maciej, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Lorenzo Bianconi, Tariq Toukan,
	Willem de Bruijn, Maxime Coquelin, Andrii Nakryiko,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
	Jose Abreu, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, bpf@vger.kernel.org,
	xdp-hints@xdp-project.net,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kselftest@vger.kernel.org
On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka
<florian.bezdeka@siemens.com> wrote:
>
> On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote:
> > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote:
> > > Jesper Dangaard Brouer wrote:
> > > >
> > > >
> > > > On 12/3/23 17:51, Song Yoong Siang wrote:
> > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
> > > > > copy via XDP Tx metadata framework.
> > > > >
> > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
> > > > > ---
> > > > >   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
> > > >
> > > > As requested before, I think we need to see another driver implementing
> > > > this.
> > > >
> > > > I propose driver igc and chip i225.
> >
> > Sure. I will include igc patches in next version.
> >
> > > >
> > > > The interesting thing for me is to see how the LaunchTime max 1 second
> > > > into the future[1] is handled code wise. One suggestion is to add a
> > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that
> > > > mentions/documents these different hardware limitations.  It is natural
> > > > that different types of hardware have limitations.  This is a close-to
> > > > hardware-level abstraction/API, and IMHO as long as we document the
> > > > limitations we can expose this API without too many limitations for more
> > > > capable hardware.
> >
> > Sure. I will try to add hardware limitations in documentation.
> >
> > >
> > > I would assume that the kfunc will fail when a value is passed that
> > > cannot be programmed.
> > >
> >
> > In current design, the xsk_tx_metadata_request() dint got return value.
> > So user won't know if their request is fail.
> > It is complex to inform user which request is failing.
> > Therefore, IMHO, it is good that we let driver handle the error silently.
> >
>
> If the programmed value is invalid, the packet will be "dropped" / will
> never make it to the wire, right?
>
> That is clearly a situation that the user should be informed about. For
> RT systems this normally means that something is really wrong regarding
> timing / cycle overflow. Such systems have to react on that situation.
In general, af_xdp is a bit lacking in this 'notify the user that they
somehow messed up' area :-(
For example, pushing a tx descriptor with a wrong addr/len in zc mode
will not give any visible signal back (besides driver potentially
spilling something into dmesg as it was in the mlx case).
We can probably start with having some counters for these events?
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-05 17:13           ` Stanislav Fomichev
@ 2023-12-05 18:03             ` Willem de Bruijn
  2023-12-05 19:39               ` Stanislav Fomichev
  0 siblings, 1 reply; 19+ messages in thread
From: Willem de Bruijn @ 2023-12-05 18:03 UTC (permalink / raw)
  To: Stanislav Fomichev, Florian Bezdeka
  Cc: Song, Yoong Siang, Willem de Bruijn, Jesper Dangaard Brouer,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Bjorn Topel, Karlsson, Magnus,
	Fijalkowski, Maciej, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Lorenzo Bianconi, Tariq Toukan,
	Willem de Bruijn, Maxime Coquelin, Andrii Nakryiko,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
	Jose Abreu, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, bpf@vger.kernel.org,
	xdp-hints@xdp-project.net,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kselftest@vger.kernel.org
Stanislav Fomichev wrote:
> On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka
> <florian.bezdeka@siemens.com> wrote:
> >
> > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote:
> > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote:
> > > > Jesper Dangaard Brouer wrote:
> > > > >
> > > > >
> > > > > On 12/3/23 17:51, Song Yoong Siang wrote:
> > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
> > > > > > copy via XDP Tx metadata framework.
> > > > > >
> > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
> > > > > > ---
> > > > > >   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
> > > > >
> > > > > As requested before, I think we need to see another driver implementing
> > > > > this.
> > > > >
> > > > > I propose driver igc and chip i225.
> > >
> > > Sure. I will include igc patches in next version.
> > >
> > > > >
> > > > > The interesting thing for me is to see how the LaunchTime max 1 second
> > > > > into the future[1] is handled code wise. One suggestion is to add a
> > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that
> > > > > mentions/documents these different hardware limitations.  It is natural
> > > > > that different types of hardware have limitations.  This is a close-to
> > > > > hardware-level abstraction/API, and IMHO as long as we document the
> > > > > limitations we can expose this API without too many limitations for more
> > > > > capable hardware.
> > >
> > > Sure. I will try to add hardware limitations in documentation.
> > >
> > > >
> > > > I would assume that the kfunc will fail when a value is passed that
> > > > cannot be programmed.
> > > >
> > >
> > > In current design, the xsk_tx_metadata_request() dint got return value.
> > > So user won't know if their request is fail.
> > > It is complex to inform user which request is failing.
> > > Therefore, IMHO, it is good that we let driver handle the error silently.
> > >
> >
> > If the programmed value is invalid, the packet will be "dropped" / will
> > never make it to the wire, right?
Programmable behavior is to either drop or cap to some boundary
value, such as the farthest programmable time in the future: the
horizon. In fq:
                /* Check if packet timestamp is too far in the future. */
                if (fq_packet_beyond_horizon(skb, q, now)) {
                        if (q->horizon_drop) {
                                        q->stat_horizon_drops++;
                                        return qdisc_drop(skb, sch, to_free);
                        }
                        q->stat_horizon_caps++;
                        skb->tstamp = now + q->horizon;
                }
                fq_skb_cb(skb)->time_to_send = skb->tstamp;
Drop is the more obviously correct mode.
Programming with a clock source that the driver does not support will
then be a persistent failure.
Preferably, this driver capability can be queried beforehand (rather
than only through reading error counters afterwards).
Perhaps it should not be a driver task to convert from possibly
multiple clock sources to the device native clock. Right now, we do
use per-device timecounters for this, implemented in the driver.
As for which clocks are relevant. For PTP, I suppose the device PHC,
converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC.
> >
> > That is clearly a situation that the user should be informed about. For
> > RT systems this normally means that something is really wrong regarding
> > timing / cycle overflow. Such systems have to react on that situation.
> 
> In general, af_xdp is a bit lacking in this 'notify the user that they
> somehow messed up' area :-(
> For example, pushing a tx descriptor with a wrong addr/len in zc mode
> will not give any visible signal back (besides driver potentially
> spilling something into dmesg as it was in the mlx case).
> We can probably start with having some counters for these events?
This is because the AF_XDP completion queue descriptor format is only
a u64 address?
Could error conditions be reported on tx completion in the metadata,
using xsk_tx_metadata_complete?
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-05 18:03             ` Willem de Bruijn
@ 2023-12-05 19:39               ` Stanislav Fomichev
  2023-12-05 20:01                 ` Willem de Bruijn
  2023-12-06 10:06                 ` Magnus Karlsson
  0 siblings, 2 replies; 19+ messages in thread
From: Stanislav Fomichev @ 2023-12-05 19:39 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Florian Bezdeka, yoong.siang.song, Jesper Dangaard Brouer, davem,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jonathan Corbet,
	Bjorn Topel, magnus.karlsson, maciej.fijalkowski, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Lorenzo Bianconi, Tariq Toukan, Willem de Bruijn, Maxime Coquelin,
	Andrii Nakryiko, Mykola Lysenko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
	Alexandre Torgue, Jose Abreu, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	bpf@vger.kernel.org, xdp-hints@xdp-project.net,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kselftest@vger.kernel.org
On 12/05, Willem de Bruijn wrote:
> Stanislav Fomichev wrote:
> > On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka
> > <florian.bezdeka@siemens.com> wrote:
> > >
> > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote:
> > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote:
> > > > > Jesper Dangaard Brouer wrote:
> > > > > >
> > > > > >
> > > > > > On 12/3/23 17:51, Song Yoong Siang wrote:
> > > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
> > > > > > > copy via XDP Tx metadata framework.
> > > > > > >
> > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
> > > > > > > ---
> > > > > > >   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
> > > > > >
> > > > > > As requested before, I think we need to see another driver implementing
> > > > > > this.
> > > > > >
> > > > > > I propose driver igc and chip i225.
> > > >
> > > > Sure. I will include igc patches in next version.
> > > >
> > > > > >
> > > > > > The interesting thing for me is to see how the LaunchTime max 1 second
> > > > > > into the future[1] is handled code wise. One suggestion is to add a
> > > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that
> > > > > > mentions/documents these different hardware limitations.  It is natural
> > > > > > that different types of hardware have limitations.  This is a close-to
> > > > > > hardware-level abstraction/API, and IMHO as long as we document the
> > > > > > limitations we can expose this API without too many limitations for more
> > > > > > capable hardware.
> > > >
> > > > Sure. I will try to add hardware limitations in documentation.
> > > >
> > > > >
> > > > > I would assume that the kfunc will fail when a value is passed that
> > > > > cannot be programmed.
> > > > >
> > > >
> > > > In current design, the xsk_tx_metadata_request() dint got return value.
> > > > So user won't know if their request is fail.
> > > > It is complex to inform user which request is failing.
> > > > Therefore, IMHO, it is good that we let driver handle the error silently.
> > > >
> > >
> > > If the programmed value is invalid, the packet will be "dropped" / will
> > > never make it to the wire, right?
> 
> Programmable behavior is to either drop or cap to some boundary
> value, such as the farthest programmable time in the future: the
> horizon. In fq:
> 
>                 /* Check if packet timestamp is too far in the future. */
>                 if (fq_packet_beyond_horizon(skb, q, now)) {
>                         if (q->horizon_drop) {
>                                         q->stat_horizon_drops++;
>                                         return qdisc_drop(skb, sch, to_free);
>                         }
>                         q->stat_horizon_caps++;
>                         skb->tstamp = now + q->horizon;
>                 }
>                 fq_skb_cb(skb)->time_to_send = skb->tstamp;
> 
> Drop is the more obviously correct mode.
> 
> Programming with a clock source that the driver does not support will
> then be a persistent failure.
> 
> Preferably, this driver capability can be queried beforehand (rather
> than only through reading error counters afterwards).
> 
> Perhaps it should not be a driver task to convert from possibly
> multiple clock sources to the device native clock. Right now, we do
> use per-device timecounters for this, implemented in the driver.
> 
> As for which clocks are relevant. For PTP, I suppose the device PHC,
> converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC.
Do we need to expose some generic netdev netlink apis to query/adjust
nic clock sources (or maybe there is something existing already)?
Then the userspace can be responsible for syncing/converting the
timestamps to the internal nic clocks. +1 to trying to avoid doing
this in the drivers.
> > > That is clearly a situation that the user should be informed about. For
> > > RT systems this normally means that something is really wrong regarding
> > > timing / cycle overflow. Such systems have to react on that situation.
> > 
> > In general, af_xdp is a bit lacking in this 'notify the user that they
> > somehow messed up' area :-(
> > For example, pushing a tx descriptor with a wrong addr/len in zc mode
> > will not give any visible signal back (besides driver potentially
> > spilling something into dmesg as it was in the mlx case).
> > We can probably start with having some counters for these events?
> 
> This is because the AF_XDP completion queue descriptor format is only
> a u64 address?
Yeah. XDP_COPY mode has the descriptor validation which is exported via
recvmsg errno, but zerocopy path seems to be too deep in the stack
to report something back. And there is no place, as you mention,
in the completion ring to report the status.
> Could error conditions be reported on tx completion in the metadata,
> using xsk_tx_metadata_complete?
That would be one way to do it, yes. But then the error reporting depends
on the metadata opt-in. Having a separate ring to export the errors,
or having a v2 tx-completions layout with extra 'status' field would also
work.
But this seems like something that should be handled separately? Because
we'd have to teach all existing zc drivers to report those errors back
instead of dropping these descriptors..
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-05 19:39               ` Stanislav Fomichev
@ 2023-12-05 20:01                 ` Willem de Bruijn
  2023-12-06 10:06                 ` Magnus Karlsson
  1 sibling, 0 replies; 19+ messages in thread
From: Willem de Bruijn @ 2023-12-05 20:01 UTC (permalink / raw)
  To: Stanislav Fomichev, Willem de Bruijn
  Cc: Florian Bezdeka, yoong.siang.song, Jesper Dangaard Brouer, davem,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jonathan Corbet,
	Bjorn Topel, magnus.karlsson, maciej.fijalkowski, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Lorenzo Bianconi, Tariq Toukan, Willem de Bruijn, Maxime Coquelin,
	Andrii Nakryiko, Mykola Lysenko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
	Alexandre Torgue, Jose Abreu, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	bpf@vger.kernel.org, xdp-hints@xdp-project.net,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kselftest@vger.kernel.org
Stanislav Fomichev wrote:
> On 12/05, Willem de Bruijn wrote:
> > Stanislav Fomichev wrote:
> > > On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka
> > > <florian.bezdeka@siemens.com> wrote:
> > > >
> > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote:
> > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote:
> > > > > > Jesper Dangaard Brouer wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote:
> > > > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
> > > > > > > > copy via XDP Tx metadata framework.
> > > > > > > >
> > > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
> > > > > > > > ---
> > > > > > > >   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
> > > > > > >
> > > > > > > As requested before, I think we need to see another driver implementing
> > > > > > > this.
> > > > > > >
> > > > > > > I propose driver igc and chip i225.
> > > > >
> > > > > Sure. I will include igc patches in next version.
> > > > >
> > > > > > >
> > > > > > > The interesting thing for me is to see how the LaunchTime max 1 second
> > > > > > > into the future[1] is handled code wise. One suggestion is to add a
> > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that
> > > > > > > mentions/documents these different hardware limitations.  It is natural
> > > > > > > that different types of hardware have limitations.  This is a close-to
> > > > > > > hardware-level abstraction/API, and IMHO as long as we document the
> > > > > > > limitations we can expose this API without too many limitations for more
> > > > > > > capable hardware.
> > > > >
> > > > > Sure. I will try to add hardware limitations in documentation.
> > > > >
> > > > > >
> > > > > > I would assume that the kfunc will fail when a value is passed that
> > > > > > cannot be programmed.
> > > > > >
> > > > >
> > > > > In current design, the xsk_tx_metadata_request() dint got return value.
> > > > > So user won't know if their request is fail.
> > > > > It is complex to inform user which request is failing.
> > > > > Therefore, IMHO, it is good that we let driver handle the error silently.
> > > > >
> > > >
> > > > If the programmed value is invalid, the packet will be "dropped" / will
> > > > never make it to the wire, right?
> > 
> > Programmable behavior is to either drop or cap to some boundary
> > value, such as the farthest programmable time in the future: the
> > horizon. In fq:
> > 
> >                 /* Check if packet timestamp is too far in the future. */
> >                 if (fq_packet_beyond_horizon(skb, q, now)) {
> >                         if (q->horizon_drop) {
> >                                         q->stat_horizon_drops++;
> >                                         return qdisc_drop(skb, sch, to_free);
> >                         }
> >                         q->stat_horizon_caps++;
> >                         skb->tstamp = now + q->horizon;
> >                 }
> >                 fq_skb_cb(skb)->time_to_send = skb->tstamp;
> > 
> > Drop is the more obviously correct mode.
> > 
> > Programming with a clock source that the driver does not support will
> > then be a persistent failure.
> > 
> > Preferably, this driver capability can be queried beforehand (rather
> > than only through reading error counters afterwards).
> > 
> > Perhaps it should not be a driver task to convert from possibly
> > multiple clock sources to the device native clock. Right now, we do
> > use per-device timecounters for this, implemented in the driver.
> > 
> > As for which clocks are relevant. For PTP, I suppose the device PHC,
> > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC.
> 
> Do we need to expose some generic netdev netlink apis to query/adjust
> nic clock sources (or maybe there is something existing already)?
> Then the userspace can be responsible for syncing/converting the
> timestamps to the internal nic clocks. +1 to trying to avoid doing
> this in the drivers.
Perhaps. I'm just a bit hesitant since that is UAPI and this is all
quite hand-wavy still.
Some of the conversion necessarily has to be in the driver. Only the
driver knows the descriptor format, and limitations of that, such as
the bit-width that can be encoded.
If we cannot move anything out of the drivers (quite likely), then
agreed that a netdev/ethtool netlink query approach is helpful.
To be clear: I don't mean that that should be part of this series.
This is not an XSK specific concern.
> > > > That is clearly a situation that the user should be informed about. For
> > > > RT systems this normally means that something is really wrong regarding
> > > > timing / cycle overflow. Such systems have to react on that situation.
> > > 
> > > In general, af_xdp is a bit lacking in this 'notify the user that they
> > > somehow messed up' area :-(
> > > For example, pushing a tx descriptor with a wrong addr/len in zc mode
> > > will not give any visible signal back (besides driver potentially
> > > spilling something into dmesg as it was in the mlx case).
> > > We can probably start with having some counters for these events?
> > 
> > This is because the AF_XDP completion queue descriptor format is only
> > a u64 address?
> 
> Yeah. XDP_COPY mode has the descriptor validation which is exported via
> recvmsg errno, but zerocopy path seems to be too deep in the stack
> to report something back. And there is no place, as you mention,
> in the completion ring to report the status.
> 
> > Could error conditions be reported on tx completion in the metadata,
> > using xsk_tx_metadata_complete?
> 
> That would be one way to do it, yes. But then the error reporting depends
> on the metadata opt-in. Having a separate ring to export the errors,
> or having a v2 tx-completions layout with extra 'status' field would also
> work.
> 
> But this seems like something that should be handled separately? Because
> we'd have to teach all existing zc drivers to report those errors back
> instead of dropping these descriptors..
Agreed on both points :) A v2 tx-completions that supports status
could be useful. But again, this is out of scope of this specific
launch time feature.
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-05 19:39               ` Stanislav Fomichev
  2023-12-05 20:01                 ` Willem de Bruijn
@ 2023-12-06 10:06                 ` Magnus Karlsson
  2023-12-06 19:06                   ` Stanislav Fomichev
  1 sibling, 1 reply; 19+ messages in thread
From: Magnus Karlsson @ 2023-12-06 10:06 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Willem de Bruijn, Florian Bezdeka, yoong.siang.song,
	Jesper Dangaard Brouer, davem, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Bjorn Topel, magnus.karlsson,
	maciej.fijalkowski, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Lorenzo Bianconi, Tariq Toukan,
	Willem de Bruijn, Maxime Coquelin, Andrii Nakryiko,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
	Jose Abreu, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, bpf@vger.kernel.org,
	xdp-hints@xdp-project.net,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kselftest@vger.kernel.org
On Tue, 5 Dec 2023 at 20:39, Stanislav Fomichev <sdf@google.com> wrote:
>
> On 12/05, Willem de Bruijn wrote:
> > Stanislav Fomichev wrote:
> > > On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka
> > > <florian.bezdeka@siemens.com> wrote:
> > > >
> > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote:
> > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote:
> > > > > > Jesper Dangaard Brouer wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote:
> > > > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
> > > > > > > > copy via XDP Tx metadata framework.
> > > > > > > >
> > > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
> > > > > > > > ---
> > > > > > > >   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
> > > > > > >
> > > > > > > As requested before, I think we need to see another driver implementing
> > > > > > > this.
> > > > > > >
> > > > > > > I propose driver igc and chip i225.
> > > > >
> > > > > Sure. I will include igc patches in next version.
> > > > >
> > > > > > >
> > > > > > > The interesting thing for me is to see how the LaunchTime max 1 second
> > > > > > > into the future[1] is handled code wise. One suggestion is to add a
> > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that
> > > > > > > mentions/documents these different hardware limitations.  It is natural
> > > > > > > that different types of hardware have limitations.  This is a close-to
> > > > > > > hardware-level abstraction/API, and IMHO as long as we document the
> > > > > > > limitations we can expose this API without too many limitations for more
> > > > > > > capable hardware.
> > > > >
> > > > > Sure. I will try to add hardware limitations in documentation.
> > > > >
> > > > > >
> > > > > > I would assume that the kfunc will fail when a value is passed that
> > > > > > cannot be programmed.
> > > > > >
> > > > >
> > > > > In current design, the xsk_tx_metadata_request() dint got return value.
> > > > > So user won't know if their request is fail.
> > > > > It is complex to inform user which request is failing.
> > > > > Therefore, IMHO, it is good that we let driver handle the error silently.
> > > > >
> > > >
> > > > If the programmed value is invalid, the packet will be "dropped" / will
> > > > never make it to the wire, right?
> >
> > Programmable behavior is to either drop or cap to some boundary
> > value, such as the farthest programmable time in the future: the
> > horizon. In fq:
> >
> >                 /* Check if packet timestamp is too far in the future. */
> >                 if (fq_packet_beyond_horizon(skb, q, now)) {
> >                         if (q->horizon_drop) {
> >                                         q->stat_horizon_drops++;
> >                                         return qdisc_drop(skb, sch, to_free);
> >                         }
> >                         q->stat_horizon_caps++;
> >                         skb->tstamp = now + q->horizon;
> >                 }
> >                 fq_skb_cb(skb)->time_to_send = skb->tstamp;
> >
> > Drop is the more obviously correct mode.
> >
> > Programming with a clock source that the driver does not support will
> > then be a persistent failure.
> >
> > Preferably, this driver capability can be queried beforehand (rather
> > than only through reading error counters afterwards).
> >
> > Perhaps it should not be a driver task to convert from possibly
> > multiple clock sources to the device native clock. Right now, we do
> > use per-device timecounters for this, implemented in the driver.
> >
> > As for which clocks are relevant. For PTP, I suppose the device PHC,
> > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC.
>
> Do we need to expose some generic netdev netlink apis to query/adjust
> nic clock sources (or maybe there is something existing already)?
> Then the userspace can be responsible for syncing/converting the
> timestamps to the internal nic clocks. +1 to trying to avoid doing
> this in the drivers.
>
> > > > That is clearly a situation that the user should be informed about. For
> > > > RT systems this normally means that something is really wrong regarding
> > > > timing / cycle overflow. Such systems have to react on that situation.
> > >
> > > In general, af_xdp is a bit lacking in this 'notify the user that they
> > > somehow messed up' area :-(
> > > For example, pushing a tx descriptor with a wrong addr/len in zc mode
> > > will not give any visible signal back (besides driver potentially
> > > spilling something into dmesg as it was in the mlx case).
> > > We can probably start with having some counters for these events?
> >
> > This is because the AF_XDP completion queue descriptor format is only
> > a u64 address?
>
> Yeah. XDP_COPY mode has the descriptor validation which is exported via
> recvmsg errno, but zerocopy path seems to be too deep in the stack
> to report something back. And there is no place, as you mention,
> in the completion ring to report the status.
>
> > Could error conditions be reported on tx completion in the metadata,
> > using xsk_tx_metadata_complete?
>
> That would be one way to do it, yes. But then the error reporting depends
> on the metadata opt-in. Having a separate ring to export the errors,
> or having a v2 tx-completions layout with extra 'status' field would also
> work.
There are error counters for the non-metadata and offloading cases
above that can be retrieved with the XDP_STATISTICS getsockopt(). From
if_xdp.h:
struct xdp_statistics {
        __u64 rx_dropped; /* Dropped for other reasons */
        __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
        __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
        __u64 rx_ring_full; /* Dropped due to rx ring being full */
        __u64 rx_fill_ring_empty_descs; /* Failed to retrieve item
from fill ring */
        __u64 tx_ring_empty_descs; /* Failed to retrieve item from tx ring */
};
Albeit, these are aggregate statistics and do not say anything about
which packet that caused it. Works well for things that are
programming bugs that should not occur (such as rx_invalid_descs and
tx_invalid_descs) and requires the programmer to debug and fix his or
her program, but it does not work for requests that might fail even
though the program is correct and need to be handled on a packet by
packet basis. So something needs to be added for that as you both say.
Would prefer if we could avoid a v2 completion descriptor format or
another ring that needs to be checked all the time, so if we could
live with providing the error status in the metadata field of the
packet at completion time, that would be good. Though having the error
status in the completion ring would be faster as that cache line is
hot, while the metadata section of the packet is likely not at
completion time. So that speaks for a v2 completion ring format. Just
thinking out loud here.
> But this seems like something that should be handled separately? Because
> we'd have to teach all existing zc drivers to report those errors back
> instead of dropping these descriptors..
>
^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-06 10:06                 ` Magnus Karlsson
@ 2023-12-06 19:06                   ` Stanislav Fomichev
  2023-12-18  2:54                     ` Song, Yoong Siang
  0 siblings, 1 reply; 19+ messages in thread
From: Stanislav Fomichev @ 2023-12-06 19:06 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Willem de Bruijn, Florian Bezdeka, yoong.siang.song,
	Jesper Dangaard Brouer, davem, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jonathan Corbet, Bjorn Topel, magnus.karlsson,
	maciej.fijalkowski, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Lorenzo Bianconi, Tariq Toukan,
	Willem de Bruijn, Maxime Coquelin, Andrii Nakryiko,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
	Jose Abreu, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, bpf@vger.kernel.org,
	xdp-hints@xdp-project.net,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kselftest@vger.kernel.org
On 12/06, Magnus Karlsson wrote:
> On Tue, 5 Dec 2023 at 20:39, Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On 12/05, Willem de Bruijn wrote:
> > > Stanislav Fomichev wrote:
> > > > On Tue, Dec 5, 2023 at 7:34 AM Florian Bezdeka
> > > > <florian.bezdeka@siemens.com> wrote:
> > > > >
> > > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote:
> > > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote:
> > > > > > > Jesper Dangaard Brouer wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote:
> > > > > > > > > This patch enables Launch Time (Time-Based Scheduling) support to XDP zero
> > > > > > > > > copy via XDP Tx metadata framework.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Song Yoong Siang<yoong.siang.song@intel.com>
> > > > > > > > > ---
> > > > > > > > >   drivers/net/ethernet/stmicro/stmmac/stmmac.h      |  2 ++
> > > > > > > >
> > > > > > > > As requested before, I think we need to see another driver implementing
> > > > > > > > this.
> > > > > > > >
> > > > > > > > I propose driver igc and chip i225.
> > > > > >
> > > > > > Sure. I will include igc patches in next version.
> > > > > >
> > > > > > > >
> > > > > > > > The interesting thing for me is to see how the LaunchTime max 1 second
> > > > > > > > into the future[1] is handled code wise. One suggestion is to add a
> > > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per driver that
> > > > > > > > mentions/documents these different hardware limitations.  It is natural
> > > > > > > > that different types of hardware have limitations.  This is a close-to
> > > > > > > > hardware-level abstraction/API, and IMHO as long as we document the
> > > > > > > > limitations we can expose this API without too many limitations for more
> > > > > > > > capable hardware.
> > > > > >
> > > > > > Sure. I will try to add hardware limitations in documentation.
> > > > > >
> > > > > > >
> > > > > > > I would assume that the kfunc will fail when a value is passed that
> > > > > > > cannot be programmed.
> > > > > > >
> > > > > >
> > > > > > In current design, the xsk_tx_metadata_request() dint got return value.
> > > > > > So user won't know if their request is fail.
> > > > > > It is complex to inform user which request is failing.
> > > > > > Therefore, IMHO, it is good that we let driver handle the error silently.
> > > > > >
> > > > >
> > > > > If the programmed value is invalid, the packet will be "dropped" / will
> > > > > never make it to the wire, right?
> > >
> > > Programmable behavior is to either drop or cap to some boundary
> > > value, such as the farthest programmable time in the future: the
> > > horizon. In fq:
> > >
> > >                 /* Check if packet timestamp is too far in the future. */
> > >                 if (fq_packet_beyond_horizon(skb, q, now)) {
> > >                         if (q->horizon_drop) {
> > >                                         q->stat_horizon_drops++;
> > >                                         return qdisc_drop(skb, sch, to_free);
> > >                         }
> > >                         q->stat_horizon_caps++;
> > >                         skb->tstamp = now + q->horizon;
> > >                 }
> > >                 fq_skb_cb(skb)->time_to_send = skb->tstamp;
> > >
> > > Drop is the more obviously correct mode.
> > >
> > > Programming with a clock source that the driver does not support will
> > > then be a persistent failure.
> > >
> > > Preferably, this driver capability can be queried beforehand (rather
> > > than only through reading error counters afterwards).
> > >
> > > Perhaps it should not be a driver task to convert from possibly
> > > multiple clock sources to the device native clock. Right now, we do
> > > use per-device timecounters for this, implemented in the driver.
> > >
> > > As for which clocks are relevant. For PTP, I suppose the device PHC,
> > > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC.
> >
> > Do we need to expose some generic netdev netlink apis to query/adjust
> > nic clock sources (or maybe there is something existing already)?
> > Then the userspace can be responsible for syncing/converting the
> > timestamps to the internal nic clocks. +1 to trying to avoid doing
> > this in the drivers.
> >
> > > > > That is clearly a situation that the user should be informed about. For
> > > > > RT systems this normally means that something is really wrong regarding
> > > > > timing / cycle overflow. Such systems have to react on that situation.
> > > >
> > > > In general, af_xdp is a bit lacking in this 'notify the user that they
> > > > somehow messed up' area :-(
> > > > For example, pushing a tx descriptor with a wrong addr/len in zc mode
> > > > will not give any visible signal back (besides driver potentially
> > > > spilling something into dmesg as it was in the mlx case).
> > > > We can probably start with having some counters for these events?
> > >
> > > This is because the AF_XDP completion queue descriptor format is only
> > > a u64 address?
> >
> > Yeah. XDP_COPY mode has the descriptor validation which is exported via
> > recvmsg errno, but zerocopy path seems to be too deep in the stack
> > to report something back. And there is no place, as you mention,
> > in the completion ring to report the status.
> >
> > > Could error conditions be reported on tx completion in the metadata,
> > > using xsk_tx_metadata_complete?
> >
> > That would be one way to do it, yes. But then the error reporting depends
> > on the metadata opt-in. Having a separate ring to export the errors,
> > or having a v2 tx-completions layout with extra 'status' field would also
> > work.
> 
> There are error counters for the non-metadata and offloading cases
> above that can be retrieved with the XDP_STATISTICS getsockopt(). From
> if_xdp.h:
> 
> struct xdp_statistics {
>         __u64 rx_dropped; /* Dropped for other reasons */
>         __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
>         __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
>         __u64 rx_ring_full; /* Dropped due to rx ring being full */
>         __u64 rx_fill_ring_empty_descs; /* Failed to retrieve item
> from fill ring */
>         __u64 tx_ring_empty_descs; /* Failed to retrieve item from tx ring */
> };
> 
> Albeit, these are aggregate statistics and do not say anything about
> which packet that caused it. Works well for things that are
> programming bugs that should not occur (such as rx_invalid_descs and
> tx_invalid_descs) and requires the programmer to debug and fix his or
> her program, but it does not work for requests that might fail even
> though the program is correct and need to be handled on a packet by
> packet basis. So something needs to be added for that as you both say.
> 
> Would prefer if we could avoid a v2 completion descriptor format or
> another ring that needs to be checked all the time, so if we could
> live with providing the error status in the metadata field of the
> packet at completion time, that would be good. Though having the error
> status in the completion ring would be faster as that cache line is
> hot, while the metadata section of the packet is likely not at
> completion time. So that speaks for a v2 completion ring format. Just
> thinking out loud here.
In this case, maybe adding tx_over_horizon_dropped to XDP_STATISTICS
is all we need here? We can have some new api to query this horizon
per netdev.
^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC
  2023-12-06 19:06                   ` Stanislav Fomichev
@ 2023-12-18  2:54                     ` Song, Yoong Siang
  0 siblings, 0 replies; 19+ messages in thread
From: Song, Yoong Siang @ 2023-12-18  2:54 UTC (permalink / raw)
  To: Stanislav Fomichev, Magnus Karlsson
  Cc: Willem de Bruijn, Bezdeka, Florian, Jesper Dangaard Brouer,
	davem@davemloft.net, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jonathan Corbet, Bjorn Topel, Karlsson, Magnus,
	Fijalkowski, Maciej, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Lorenzo Bianconi, Tariq Toukan,
	Willem de Bruijn, Maxime Coquelin, Andrii Nakryiko,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Alexandre Torgue,
	Jose Abreu, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, bpf@vger.kernel.org,
	xdp-hints@xdp-project.net,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kselftest@vger.kernel.org
Hi all,
Fyi, I submitted a patch [1] to enable tx metadata for igc driver as a preparation to add launch time to it.
After the patch is accepted, I will include igc driver in next version of launch time patch set.
[1] https://patchwork.kernel.org/project/netdevbpf/patch/20231215162158.951925-1-yoong.siang.song@intel.com/
Thanks & Regards
Siang
^ permalink raw reply	[flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-12-18  2:54 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-03 16:51 [PATCH bpf-next v3 0/3] xsk: TX metadata Launch Time support Song Yoong Siang
2023-12-03 16:51 ` [PATCH bpf-next v3 1/3] xsk: add Launch Time HW offload to XDP Tx metadata Song Yoong Siang
2023-12-03 16:51 ` [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC Song Yoong Siang
2023-12-04 10:36   ` Jesper Dangaard Brouer
2023-12-04 11:54     ` [xdp-hints] " Florian Bezdeka
2023-12-04 14:54     ` Willem de Bruijn
2023-12-05 15:25       ` Song, Yoong Siang
2023-12-05 15:34         ` [xdp-hints] " Florian Bezdeka
2023-12-05 17:13           ` Stanislav Fomichev
2023-12-05 18:03             ` Willem de Bruijn
2023-12-05 19:39               ` Stanislav Fomichev
2023-12-05 20:01                 ` Willem de Bruijn
2023-12-06 10:06                 ` Magnus Karlsson
2023-12-06 19:06                   ` Stanislav Fomichev
2023-12-18  2:54                     ` Song, Yoong Siang
2023-12-03 16:51 ` [PATCH bpf-next v3 3/3] selftests/bpf: add Launch Time request to xdp_hw_metadata Song Yoong Siang
2023-12-04 10:16   ` Jesper Dangaard Brouer
2023-12-04 14:59     ` Willem de Bruijn
2023-12-05 14:55       ` [xdp-hints] " Song, Yoong Siang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).